Instruction Substitution
An obfuscation pass replaces simple instructions with longer functionally-equivalent sequences — e.g. add becomes a push/pop/lea or sub/neg chain — to break signatures and obscure intent.
Instruction substitution replaces a single operation with a longer sequence of instructions that compute the same result. A plain a + b might become a - (-b), or an integer constant load might be split into several arithmetic steps. The semantics are preserved, but the binary no longer contains the canonical instruction a signature or analyst expects.
This is one of the headline passes in Obfuscator-LLVM (O-LLVM) and its many forks (-sub). Because the substitution operates on LLVM IR before code generation, it is portable across architectures and can apply different equivalent forms to each occurrence, yielding polymorphic output. It is frequently combined with control-flow flattening and bogus control flow.
How it works
The pass keeps a table of equivalence rules per IR operator and picks one (often randomly) at each call site:
// Original arithmetic
int r = a + b;
// Equivalent substitutions the pass may emit:
int r = a - (-b); // add -> negate + subtract
int r = -((-a) + (-b)); // add -> triple negate
// for a - b:
int r = a + (~b) + 1; // sub -> NOT + add + 1 (two's complement)
// for a ^ b:
int r = (a | b) & ~(a & b); // xor -> OR/AND/NOT identity
// for a & b:
int r = ~(~a | ~b); // and -> De MorganAt the assembly level, a trivial add is expanded into several instructions that no add-based pattern will match:
; original: add eax, ebx ; eax = eax + ebx
; substituted (add -> a - (-b)):
neg ebx ; ebx = -b
sub eax, ebx ; eax = a - (-b) = a + b
neg ebx ; restore ebx (if reused)
; another form using lea/push/pop scratch:
push ebx
not ebx ; ~b
lea eax, [eax+ebx+1] ; a + (~b) + 1 == a - b (for subtraction)
pop ebxEach occurrence can use a different rule, so the same source-level + produces varied machine code throughout the binary.
Detection & analysis
Static analysis:
- A decompiler's optimiser (Ghidra P-Code simplification, Hex-Rays) folds these identities back to the original operator automatically — viewing the decompiled output usually neutralises the pass.
- Recognise O-LLVM fingerprints: De Morgan rewrites, double/triple
neg, andnot-then-add-1two's-complement chains clustered around simple arithmetic. - Pattern libraries and CAPA-style rules can match the canonical substituted sequences for common operators.
Dynamic analysis:
- Behaviourally irrelevant — execution produces identical results — so focus on simplification rather than tracing.
- A symbolic-execution engine (Triton, angr) lifts each block to an expression and simplifies it back to the source operator, useful for batch de-obfuscation.
Detection rule hint:
Flag short basic blocks that compute a single live value through a disproportionate number of neg/not instructions plus an add/sub/lea, especially when paired neg/neg or not...+1 patterns recur across many functions — a strong fingerprint of an automated substitution pass rather than hand-written code.