Skip to content

Overlapping Instructions

A jump lands inside a multi-byte instruction so the same bytes decode two ways, desynchronising linear-sweep disassemblers and hiding real code.

On x86 the instruction stream is byte-addressed and instructions are variable length, so a sequence of bytes has no single canonical decoding — the meaning depends entirely on where the decoder starts. Overlapping instructions exploit this: a branch is crafted to land inside a multi-byte instruction, so the bytes after the entry point regroup into a completely different, valid instruction sequence than the one a flat disassembler produced.

A linear-sweep disassembler (objdump, the IDA "make code" sweep) decodes strictly forward from the previous instruction's end and never sees the jump's real target offset. It therefore decodes the cover instruction and stays desynchronised — sometimes for dozens of bytes — until it happens to realign on an instruction boundary. The recursive-traversal disassembler follows the branch and decodes the hidden stream, so the two engines disagree about what the same bytes are.

How it works

The canonical gadget hides a real instruction inside the immediate operand of a preceding one. Consider these five bytes:

asm
; Bytes in memory:  EB 01 B8 11 E8 ...
; ---- Linear-sweep view (starts at the EB byte) ----
00:  EB 01           jmp  short 0x03      ; jump over the next byte
02:  B8 11 E8 ...     ; <- decoded as "mov eax, ..." consuming 11 E8 + 2 more

; ---- Real execution / recursive view (target = 0x03) ----
00:  EB 01           jmp  short 0x03
03:  E8 ...           ; the B8 was a one-byte filler; real code starts at E8 (call)

The jmp short 0x03 skips the B8 (mov eax, imm32) opcode byte. The linear sweep, not knowing the target, eagerly decodes B8 and swallows the following four bytes as a 32-bit immediate — including the real opcode the malware wanted to run. Because B8 declares a 4-byte operand, the sweep is pulled off the real boundary and the genuine E8 (call rel32) disappears into the middle of a phantom mov.

A tighter, self-referential variant uses a jump into its own operand:

asm
; Bytes:  EB FF C0 ...
00:  EB FF           jmp  short 0x01     ; target is +1, i.e. the FF byte itself
01:  FF C0           inc  eax            ; the FF/C0 is reinterpreted on the second pass

Here EB FF is jmp short -1-style encoding where the displacement FF is also the first byte of the next real instruction FF C0 (inc eax). One physical byte (FF) serves as both the jump displacement and an opcode prefix, guaranteeing the linear decode and the executed decode can never match.

Detection & analysis

Static analysis:

  • In IDA/Ghidra, a tell-tale sign is a short jmp/jcc whose target address is not on an instruction boundary the auto-analysis produced — the target lands mid-instruction and the listing shows red/undefined bytes or a "no xref to this address" warning.
  • Force a re-decode at the branch target (IDA: u to undefine, then c at the real target offset) to reveal the hidden stream; compare the two decodings.
  • Recursive-traversal engines (IDA's default) usually recover the real path; pure linear sweeps (objdump -d) will show the desynchronised garbage — diffing the two outputs pinpoints the overlap.

Dynamic analysis:

  • Single-step through the branch in a debugger; the EIP/RIP after the jump lands on the true boundary and the CPU decodes the hidden instruction correctly, regardless of what the static listing claimed.
  • A code-coverage trace (Intel PT, DynamoRIO) records the actually-executed instruction starts, which can be re-imported to fix the disassembly.

Detection rule hint:

Flag any intra-function jmp/jcc whose computed target falls strictly between the start and end of an already-decoded instruction (i.e. mid-instruction landing). Combined with a 1–4 byte displacement that overlaps the target opcode, this pattern is essentially never emitted by legitimate compilers and is a high- confidence indicator of overlapping-instruction obfuscation.

Votes

Comments(0)