x86 Instruction Encoding

x86-64 instructions have a variable length of 1–15 bytes, consisting of optional prefix bytes followed by an opcode and optional operand-encoding bytes.

Instruction layout

[Prefixes 0–4 bytes] [REX 0–1 byte] [Opcode 1–3 bytes] [ModRM 0–1 byte] [SIB 0–1 byte] [Disp 0,1,2,4 bytes] [Imm 0,1,2,4,8 bytes]

Prefix bytes

Group	Byte(s)	Purpose
Operand-size	`66`	Toggle between 16/32-bit operand
Address-size	`67`	Toggle between 32/64-bit address
Segment override	`2E`/`36`/`3E`/`26`/`64`/`65`	Use CS/SS/DS/ES/FS/GS
LOCK	`F0`	Atomic memory RMW
REP/REPNE	`F3`/`F2`	String repeat
REX (64-bit)	`40`–`4F`	64-bit operand size, access to r8–r15

REX byte

0100 W R X B
     │ │ │ └─ Extension of SIB.base
     │ │ └─── Extension of SIB.index
     │ └───── Extension of ModRM.reg
     └─────── 0=32-bit operand, 1=64-bit operand

REX.W=1 promotes 32-bit operands to 64-bit (e.g. 48 8B 45 F8 = mov rax, [rbp-8] vs 8B 45 F8 = mov eax, [ebp-8]).

ModRM byte

Bit 7–6: Mod   (00=mem, 01=mem+disp8, 10=mem+disp32, 11=register)
Bit 5–3: Reg   (register operand or opcode extension)
Bit 2–0: R/M   (register or memory base)

SIB byte (present when R/M = 100 and Mod ≠ 11)

Bit 7–6: Scale  (00=×1, 01=×2, 10=×4, 11=×8)
Bit 5–3: Index  (register; 100 = no index)
Bit 2–0: Base   (register; 101 with Mod=00 = disp32 only)

Example decode

48 8B 84 CB 78 56 34 12
│   │  │  │  └──────────── displacement = 0x12345678
│   │  │  └─────────────── SIB: scale=2(×4), index=rcx, base=rbx
│   │  └────────────────── ModRM: Mod=10(mem+disp32), Reg=rax, R/M=100(SIB)
│   └───────────────────── Opcode 8B = MOV r64, r/m64
└───────────────────────── REX.W=1 (64-bit operand)

Decoded: mov rax, [rbx + rcx*4 + 0x12345678]

Why variable length matters for disassembly

Disassembly is done linearly from a start address. If that start address is wrong by even one byte, every subsequent instruction may decode incorrectly — a technique exploited by anti-disassembly tricks.
Inserting a junk byte before a EB 01 (short jump over 1 byte) can hide instructions from a linear-sweep disassembler but not from recursive-descent disassemblers (IDA, Ghidra).
Shellcode often uses alternative encodings of the same instruction to bypass byte-based signatures (e.g. 66 90 instead of 90 for NOP).