Instruction Substitution

Instruction substitution replaces a single operation with a longer sequence of instructions that compute the same result. A plain a + b might become a - (-b), or an integer constant load might be split into several arithmetic steps. The semantics are preserved, but the binary no longer contains the canonical instruction a signature or analyst expects.

This is one of the headline passes in Obfuscator-LLVM (O-LLVM) and its many forks (-sub). Because the substitution operates on LLVM IR before code generation, it is portable across architectures and can apply different equivalent forms to each occurrence, yielding polymorphic output. It is frequently combined with control-flow flattening and bogus control flow.

How it works

The pass keeps a table of equivalence rules per IR operator and picks one (often randomly) at each call site:

// Original arithmetic
int r = a + b;

// Equivalent substitutions the pass may emit:
int r = a - (-b);              // add -> negate + subtract
int r = -((-a) + (-b));        // add -> triple negate
// for  a - b:
int r = a + (~b) + 1;          // sub -> NOT + add + 1 (two's complement)
// for  a ^ b:
int r = (a | b) & ~(a & b);    // xor -> OR/AND/NOT identity
// for  a & b:
int r = ~(~a | ~b);            // and -> De Morgan

At the assembly level, a trivial add is expanded into several instructions that no add-based pattern will match:

asm

; original:  add eax, ebx        ; eax = eax + ebx

; substituted (add  ->  a - (-b)):
neg   ebx                ; ebx = -b
sub   eax, ebx           ; eax = a - (-b) = a + b
neg   ebx                ; restore ebx (if reused)

; another form using lea/push/pop scratch:
push  ebx
not   ebx                ; ~b
lea   eax, [eax+ebx+1]   ; a + (~b) + 1 == a - b   (for subtraction)
pop   ebx

Each occurrence can use a different rule, so the same source-level + produces varied machine code throughout the binary.

Detection & analysis

Static analysis:

A decompiler's optimiser (Ghidra P-Code simplification, Hex-Rays) folds these identities back to the original operator automatically — viewing the decompiled output usually neutralises the pass.
Recognise O-LLVM fingerprints: De Morgan rewrites, double/triple neg, and not-then-add-1 two's-complement chains clustered around simple arithmetic.
Pattern libraries and CAPA-style rules can match the canonical substituted sequences for common operators.

Dynamic analysis:

Behaviourally irrelevant — execution produces identical results — so focus on simplification rather than tracing.
A symbolic-execution engine (Triton, angr) lifts each block to an expression and simplifies it back to the source operator, useful for batch de-obfuscation.

Detection rule hint:

Flag short basic blocks that compute a single live value through a disproportionate number of neg/not instructions plus an add/sub/lea, especially when paired neg/neg or not...+1 patterns recur across many functions — a strong fingerprint of an automated substitution pass rather than hand-written code.

How it works

Detection & analysis

Comments(0)