### **Chapter 3 Machine-Level Programming I: Basics** | Name | Date | Transistors | MHz | |-------------------------------------------------------|---------------------------------------------|----------------------|-----------| | 8086 | 1978 | 29K | 5-10 | | <ul><li>First 16-bit In</li><li>1MB address</li></ul> | | s for IBM PC & DOS | | | 386 | 1985 | 275K | 16-33 | | | tel processor, refer<br>ddressing", capable | | | | Pentium 4E | 2004 | 125M | 2800-3800 | | <ul> <li>First 64-bit In</li> </ul> | tel x86 processor, r | eferred to as x86-64 | | | Core 2 | 2006 | 291M | 1060-3500 | | <ul> <li>First multi-co</li> </ul> | re Intel processor | | | | Core i7 | 2008 | 791M | 1700-3900 | | <ul> <li>Four cores</li> </ul> | | | | | | 2017 | 1.7G | 4900-5200 | 3 | Recent Architect Core i7 Broadwell 2015 | ture | | | | | | | |-----------------------------------------------------------------------------|--------------------------------|---------|-------------------------|-------------|--------------------------------------|---------------|--| | <ul><li>Desktop Model</li><li>4 cores</li><li>Integrated graphics</li></ul> | DOR31.1600<br>DOR4-2133 | | DDR3L-1600<br>DDR4-2133 | | PCIe Gen3<br>24 Lanes, 6 Controllers | | | | <ul><li>3.3-3.8 GHz</li><li>65W</li></ul> | Intel®<br>Ethernet<br>2x10 GbE | DMA | BDW<br>Core | SDW<br>Core | BDW<br>Core | BDW<br>Core | | | Server Model | SATA<br>3.0 x6 | PCU | BDW<br>Core | BDW<br>Core | BDW<br>Core | BDW<br>Core | | | Integrated I/O | Clocks | USB 3.0 | USB 2.0 | | PCIe Gen2<br>8 Lanes<br>Controllers | Legacy<br>I/O | | | <ul><li>2-2.6 GHz</li><li>45W</li></ul> | | | | | | | | **Machine Programming I: Basics** - History of Intel processors and architectures - C, assembly, machine code - Assembly Basics: Registers, operands, move - Arithmetic & logical operations # x86 Clones: Advanced Micro Devices (AMD) Historically - - AMD has followed just behind Intel - A little bit slower, a lot cheaper - Then 6 - Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies - Built Opteron: tough competitor to Pentium 4 - Developed x86-64, their own extension to 64 bits - Recent Years - Intel got its act together - Leads the world in semiconductor technology - AMD has fallen behind - Relies on external semiconductor manufacturer 5 #### Intel's 64-Bit History 2001: Intel Attempts Radical Shift from IA32 to IA64 Totally different architecture (Itanium) Executes IA32 code only as legacy Performance disappointing 2003: AMD Steps in with Solution x86-64 (now called "AMD64") Intel Felt Obligated to Focus on IA64 Hard to admit mistake or that AMD is better 2004: Intel Announces EM64T extension to IA32 Extended Memory 64-bit Technology Almost identical to x86-64! All but low-end x86 processors support x86-64 But, lots of code still runs in 32-bit mode **Our Coverage** The traditional x86 x86-64 The standard gcc hello.c gcc -m64 hello.c Presentation Book covers x86-64 Web aside on IA32 We will only cover x86-64 8 10 12 7 # **Machine Programming I: Basics** History of Intel processors and architectures C, assembly, machine code Assembly Basics: Registers, operands, move Arithmetic & logical operations **Definitions** Architecture also ISA: instruction set architecture The parts of a processor design that one needs to understand to write assembly/machine code. Examples: instruction set specification, registers. Microarchitecture Implementation of the architecture. Examples: cache sizes and core frequency. Machine Code: The byte-level programs that a processor executes Assembly Code: A text representation of machine code Example ISAs Intel: x86, IA32, Itanium, x86-64 ARM: Used in almost all mobile phones 9 **Turning C into Object Code** Code in files p1.c p2.c Compile with command: gcc -Og p1.c p2.c -o p Use basic optimizations (-Og) [New to recent versions of GCC] Put resulting binary in file p C program (p1.c p2.c) text Compiler (gcc -Og -S) Asm program (p1.s p2.s) text Assembler (gcc or as) binary Object program (p1.o p2.o) Static libraries (.a) Linker (gcc or 1d) binary Executable program (p) 13 15 Assembly Characteristics: Data Types "Integer" data of 1, 2, 4, or 8 bytes Data values Addresses (untyped pointers) Floating point data of 4, 8, or 10 bytes Code: Byte sequences encoding series of instructions No aggregate types such as arrays or structures Just contiguously allocated bytes in memory 14 16 ``` Disassembling Object Code Disassembled 0000000000400595 <sumstore>: push 400596: 48 89 d3 %rdx,%rbx 400599: e8 f2 ff ff ff 48 89 03 callq 400590 <plus> %rax, (%rbx) 40059e: 4005a1: 5b %rbx 4005a2: retq Disassembler obidump -d sum Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a . out (complete executable) or . o file ``` 18 20 x86-64 Integer Registers %r8d %rax %eax %r8 %r9d %rbx %ebx %r9 %ecx %r10 %r10d %rcx %rdx %edx %r11 %r11d %rsi %esi %r12 %r12d %rdi %edi %r13 %r13d %rsp %esp 8r14 %r14d %ebp %r15d %rbp %r15 Can reference low-order 4 bytes (also low-order 1 & 2 bytes) 21 22 23 24 ``` movq Operand Combinations Source Dest Src,Dest C Analog movq $0x4,%rax temp = 0x4; Reg Mem movq $-147,(%rax) *p = -147; Reg temp2 = temp1; movq %rax, %rdx movq Reg Mem movq %rax,(%rdx) *p = temp; movq (%rax),%rdx temp = *p; Mem Reg Cannot do memory-memory transfer with a single instruction ``` 25 26 ``` Example of Simple Addressing Modes void swap(long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; } swap: movq (%rdi), %rax movq (%rsi), %rdx movq (%rsi), %rdx movq (%rdi) movq *rdx, (%rdi) movq %rax, (%rsi) ret span and Orbillaro, Computer Systems. A Programmer's Perspective, Third Edition ``` 27 28 ``` Understanding Swap() Memory Registers 123 0x120 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 0x108 456 %rdx 0x100 swap: (%rdi), %rax # t0 = *xp (%rsi), %rdx # t1 = *yp %rdx, (%rdi) # *xp = t1 movq movq movq ``` Understanding Swap() Memory Registers Address 123 0x120 %rdi 0x120 0x118 %rsi 0x100 0x110 %rax 0x108 456 0x100 %rdx swap: (%rdi), %rax # t0 = \*xp (%rsi), %rdx # t1 = \*yp %rdx, (%rdi) # \*xp = t1 movq movq 29 30 Understanding Swap() Memory Registers 0x120 %rdi 0x120 0x118 0x100 %rsi 0x110 0x108 456 0x100 %rdx swap: (%rdi), %rax # t0 = \*xp (%rsi), %rdx # t1 = \*yp %rdx, (%rdi) # \*xp = t1 movq movq %rax, (%rsi) # \*yp = t0 32 34 Understanding Swap() Memory Registers Address 0x120 %rdi 0x120 0x118 0x100 %rsi 0x110 123 0x108 %rdx 456 0x100 swap: movq (%rsi), %rdx # t1 = \*yp %rdx, (%rdi) # \*xp = t1 movq **Simple Memory Addressing Modes** (R) Mem[Reg[R]] Normal Register R specifies memory address - Aha! Pointer dereferencing in C movq (%rcx),%rax Displacement D(R) Mem[Reg[R]+D] Register R specifies start of memory region Constant displacement D specifies offset movq 8(%rbp),%rdx 33 **Complete Memory Addressing Modes** Most General Form Mem[Reg[Rb]+S\*Reg[Ri]+ D] D(Rb,Ri,S) - D: Constant "displacement" 1, 2, or 4 bytes Rb: Base register: Any of 16 integer registers Ri: Index register: Any, except for %rsp Scale: 1, 2, 4, or 8 (why these numbers?) Special Cases Mem[Reg[Rb]+Reg[Ri]] (Rb,Ri) D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S\*Reg[Ri]] **Address Computation Examples** %rdx 0xf000 0x0100 Expression Address Computation Address 0x8(%rdx) 0xf000 + 0x80xf008 (%rdx,%rcx) 0xf000 + 0x1000xf100 (%rdx,%rcx,4) 0xf000 + 4\*0x100 0xf400 2\*0xf000 + 0x80 0x80(,%rdx,2) 0x1e080 35 36 **Address Computation Instruction** leaq Src, Dst load effective address quad Src is address mode expression Set Dst to address denoted by expression Computing addresses without a memory reference E.g., translation of p = &x[i]; Computing arithmetic expressions of the form x + k\*y k = 1, 2, 4, or 8 Example long m12(long x) Converted to ASM by compiler: leaq (%rdi,%rdi,2), %rax # t <- x+x\*2 salq \$2, %rax # return t<</pre> return x\*12: 38 40 ``` Some Arithmetic Operations Two-Operand Instructions Format Computation addq Src.Dest Dest = Dest + Src Src,Dest Dest = Dest - Src subq imulq Src,Dest Dest = Dest * Src Src,Dest Dest = Dest << Src Also called shiq salq Src,Dest Dest = Dest >> Src Arithmetic sarq Src,Dest Dest = Dest >> Src Logical shrq xorq Src,Dest Dest = Dest ^ Src Src,Dest Dest = Dest & Src andq Src,Dest Dest = Dest | Src Watch out for argument order! No distinction between signed and unsigned int (why?) ``` **Some Arithmetic Operations** One Operand Instructions Dest Dest = Dest + 1 incq decq Dest Dest = Dest - 1Dest = - Dest Dest negq Dest = ~Dest notq Dest See book for more instructions 39 ``` Arithmetic Expression Example long arith arith: (long x, long y, long z) leaq (%rdi,%rsi), %rax addq %rdx. %rax long t1 = x+y; (%rsi,%rsi,2), %rdx long t2 = z+t1; long t3 = x+4; salq $4, %rdx 4(%rdi,%rdx), %rcx leag long t4 = y * 48; long t5 = t3 + t4; %rcx, %rax imulq ret long rval = t2 * t5; return rval; Interesting Instructions leaq: address computation • imulq: multiplication But, only used once ``` 41 **Understanding Arithmetic Expression** Example arith: leaq addq (%rdi,%rsi), %rax long arith %rdx, %rax (%rsi,%rsi,2), %rdx # t2 (long x, long y, long z) leaq salq \$4, %rdx # t4 long t1 = x+y; 4(%rdi,%rdx), %rcx # t5 leag long t2 = z+t1; imulq long t3 = x+4; long t4 = y \* 48; long t5 = t3 + t4; long rval = t2 \* t5; ret %rdi Argument x return rval; Argument y %rdx Argument ${f z}$ t1, t2, rval %rax %rdx t4 %rcx t5 7 ## **Machine Programming I: Summary** - History of Intel processors and architectures - Evolutionary design leads to many quirks and artifacts - C, assembly, machine code - New forms of visible state: program counter, registers, ... - Compiler must transform statements, expressions, procedures into low-level instruction sequences - Assembly Basics: Registers, operands, move - The x86-64 move instructions cover wide range of data movement forms - Arithmetic - C compiler will figure out different instruction combinations to carry out computation vant and O'Hallaron, Computer Systems: A Programmer's Perspective, Third Edition