Concepts
x86-64x86 Instruction Encoding
x86-64 instructions are variable-length, built from up to six fields: prefixes, opcode, ModRM, SIB, displacement, and immediate — understanding this is critical for disassembly and shellcode analysis.
x86-64 instructions have a variable length of 1–15 bytes, consisting of optional prefix bytes followed by an opcode and optional operand-encoding bytes.
Instruction layout
[Prefixes 0–4 bytes] [REX 0–1 byte] [Opcode 1–3 bytes] [ModRM 0–1 byte] [SIB 0–1 byte] [Disp 0,1,2,4 bytes] [Imm 0,1,2,4,8 bytes]
Prefix bytes
| Group | Byte(s) | Purpose |
|---|---|---|
| Operand-size | 66 | Toggle between 16/32-bit operand |
| Address-size | 67 | Toggle between 32/64-bit address |
| Segment override | 2E/36/3E/26/64/65 | Use CS/SS/DS/ES/FS/GS |
| LOCK | F0 | Atomic memory RMW |
| REP/REPNE | F3/F2 | String repeat |
| REX (64-bit) | 40–4F | 64-bit operand size, access to r8–r15 |
REX byte
0100 W R X B
│ │ │ └─ Extension of SIB.base
│ │ └─── Extension of SIB.index
│ └───── Extension of ModRM.reg
└─────── 0=32-bit operand, 1=64-bit operandREX.W=1 promotes 32-bit operands to 64-bit (e.g. 48 8B 45 F8 =
mov rax, [rbp-8] vs 8B 45 F8 = mov eax, [ebp-8]).
ModRM byte
Bit 7–6: Mod (00=mem, 01=mem+disp8, 10=mem+disp32, 11=register)
Bit 5–3: Reg (register operand or opcode extension)
Bit 2–0: R/M (register or memory base)SIB byte (present when R/M = 100 and Mod ≠ 11)
Bit 7–6: Scale (00=×1, 01=×2, 10=×4, 11=×8)
Bit 5–3: Index (register; 100 = no index)
Bit 2–0: Base (register; 101 with Mod=00 = disp32 only)Example decode
48 8B 84 CB 78 56 34 12
│ │ │ │ └──────────── displacement = 0x12345678
│ │ │ └─────────────── SIB: scale=2(×4), index=rcx, base=rbx
│ │ └────────────────── ModRM: Mod=10(mem+disp32), Reg=rax, R/M=100(SIB)
│ └───────────────────── Opcode 8B = MOV r64, r/m64
└───────────────────────── REX.W=1 (64-bit operand)
Decoded: mov rax, [rbx + rcx*4 + 0x12345678]Why variable length matters for disassembly
- Disassembly is done linearly from a start address. If that start address is wrong by even one byte, every subsequent instruction may decode incorrectly — a technique exploited by anti-disassembly tricks.
- Inserting a junk byte before a
EB 01(short jump over 1 byte) can hide instructions from a linear-sweep disassembler but not from recursive-descent disassemblers (IDA, Ghidra). - Shellcode often uses alternative encodings of the same instruction to bypass
byte-based signatures (e.g.
66 90instead of90for NOP).