Skip to content
Anti-Disassemblyintermediate

Inline Data in Code Stream

Data bytes embedded mid-function are wrongly decoded as instructions by a linear sweep, desynchronising the listing and hiding the real code.

A linear-sweep disassembler assumes every byte inside an executable section is an instruction and decodes forward without pause. Malware breaks that assumption by dropping data — a string, a key, a lookup constant, or a deliberate junk byte — directly into the code stream, then branching over it at runtime. The bytes are never executed, but the sweep does not know that: it reaches the data, decodes it as an opcode, and consumes the bytes that follow as operands.

Because the data byte usually decodes to an instruction of the wrong length, the disassembler is pushed off the real instruction boundary and stays desynchronised — swallowing the genuine instruction that comes after the data into a phantom operand — until it happens to realign. The author needs only a single well-chosen byte: an opcode like E8 (call rel32) or B8 (mov eax, imm32) that declares a multi-byte operand will eat four following bytes, including the real opcode the malware wanted to hide.

How it works

A short jmp skips a planted data byte at runtime, but the linear sweep decodes the data as an opcode and is dragged past the next real instruction:

asm
; Bytes:  EB 01  E8  90 90 FF E0 ...
; ---- Linear-sweep view (decodes the E8 as code) ----
00: EB 01            jmp  short 0x03       ; jump over the data byte at 0x02
02: E8 90 90 FF E0   call 0xE0FF9097       ; <- E8 is DATA, but read as call rel32
                                            ;    swallows 90 90 FF E0 (the real code)
07: ...               ; sweep now realigns several bytes too late

; ---- Real execution / recursive view (target = 0x03) ----
00: EB 01            jmp  short 0x03
03: 90               nop                   ; the 0x02 byte (E8) is never executed
04: 90               nop
05: FF E0            jmp  eax              ; real branch the sweep hid inside "call"

The jmp short 0x03 steps over the E8 at offset 0x02, so the CPU never runs it. The sweep, blind to the jump target, decodes E8 as call rel32 and consumes 90 90 FF E0 as its 32-bit displacement — devouring the real FF E0 (jmp eax) and the two nops into a single phantom call. The listing now shows a bogus call to 0xE0FF9097 and the genuine indirect jump vanishes.

Embedded strings produce the same desync without any obvious junk opcode:

asm
; Bytes:  EB 06  68 74 74 70 3A 00  58 ...
00: EB 06            jmp  short 0x08       ; skip the inline ASCII "http:" + NUL
02: 68 74 74 70 3A  push 0x3A707474        ; <- "http:" decoded as push imm32
07: 00 58 ...        add  [eax+...], bl    ; trailing NUL pulls in the real opcode
08: 58               pop  eax              ; real code: the string's address is now in EAX

Here the literal "http:" (68 74 74 70 3A) is referenced by the surrounding code as data, but the sweep reads the leading 68 as push imm32, consumes the string, and the stray 00 byte then mis-decodes the real 58 (pop eax) that follows the jump target.

Detection & analysis

Static analysis:

  • A short jmp/jcc over a small forward span, where the skipped bytes decode into an instruction that runs off the branch target, is the core tell. The target lands mid-instruction and the listing shows red/undefined bytes after it.
  • In IDA/Ghidra, undefine the skipped region (u), mark it as data (d), then force code (c) at the real branch target to resynchronise the listing.
  • Diff a recursive-traversal decode (IDA default) against a pure linear sweep (objdump -d): the bytes they disagree on bracket the inline data span.

Dynamic analysis:

  • Single-step over the guarding branch; the CPU jumps straight past the data, so EIP/RIP lands on the true boundary and the real instruction decodes correctly.
  • A coverage trace (Intel PT, DynamoRIO) records only executed instruction starts; re-importing those starts marks the unexecuted span as data automatically.

Detection rule hint:

Flag any jmp/jcc whose target skips a short run of bytes that, when decoded forward, produces an instruction extending past the branch target — i.e. the "covered" region desyncs the stream. Combined with high-entropy or ASCII bytes in the skipped span, the jump-over-data pattern is essentially never compiler-emitted and is a high-confidence inline-data anti-disassembly indicator.

Votes

Comments(0)