Skip to content
Obfuscationadvanced

Mixed Boolean-Arithmetic (MBA)

Malware rewrites simple operations and constants as tangled mixes of arithmetic and bitwise operators that compute the same value, defeating constant propagation and symbolic expression analysis.

Mixed Boolean-Arithmetic (MBA) obfuscation replaces a simple expression such as x + y with a far longer expression that mixes arithmetic operators (+, -, *) with bitwise/Boolean operators (&, |, ^, ~) yet is provably equivalent over machine integers. The classic seed identity is x + y == (x ^ y) + 2 * (x & y), which can be recursively expanded into arbitrarily complex forms.

Because constant folding, range analysis, and naive symbolic execution treat arithmetic and bitwise domains separately, they cannot easily collapse an MBA expression back to its original form. This makes MBA a favourite for protecting magic constants, license checks, and decryption keys. The same technique is built into commercial protectors and into open frameworks like Tigress and several LLVM-based obfuscators.

How it works

A linear MBA expression keeps the result identical while exploding the instruction count:

c
// Original:
uint32_t add(uint32_t x, uint32_t y) {
    return x + y;
}

// MBA-obfuscated (all equivalent over uint32_t):
uint32_t add_mba(uint32_t x, uint32_t y) {
    // x + y  ==  (x ^ y) + 2 * (x & y)
    // further expanded with -1 == ~0 etc.
    return (x | y) + (x & y) - (~x & y) + (~(x | y));
    // collapses to x + y for all 32-bit inputs
}

A constant can be hidden the same way. The integer 42 becomes a data-dependent expression that only evaluates to 42:

c
// 42 == (a & b) + (a | b) - (a ^ b)  reduced, but obfuscators emit:
uint32_t magic(uint32_t a) {
    return ((a ^ 0x2A) ^ a) + ((a & 0) | (0x2A & ~0));  // == 0x2A
}

In disassembly the pattern is a dense run of and, or, xor, not, add, and shl with no obvious structure:

asm
; (x ^ y) + 2*(x & y)  computing x + y
mov   eax, edi        ; x
xor   eax, esi        ; x ^ y
mov   ecx, edi
and   ecx, esi        ; x & y
lea   eax, [eax+ecx*2] ; (x^y) + 2*(x&y)  == x + y

Detection & analysis

Static analysis:

  • Look for basic blocks dominated by mixed and/or/xor/not interleaved with add/sub/lea and no memory or call activity — a signature of arithmetic encoding.
  • Use an SMT solver (Z3) to prove equivalence: extract the expression as a bit-vector formula and ask the solver whether it is equal to a candidate simple form for all inputs.
  • Dedicated MBA simplifiers — msynth, SSPAM, GAMBA, and Arybo — apply algebraic rewrite rules and program synthesis to recover the original linear form.

Dynamic analysis:

  • Treat the obfuscated routine as a black box: sample many input/output pairs and fit a linear model, or feed pairs to an oracle-guided synthesizer to recover the function.
  • A symbolic-execution engine (angr, Triton) can lift the block to an expression tree; combine with Triton's built-in synthesize pass to fold the MBA.

Detection rule hint:

Flag functions where a single basic block contains a high ratio of bitwise-to-total integer instructions (e.g., more than 60% and/or/xor/not) with no loads, stores, or calls between them — legitimate code rarely chains that many Boolean ops on the same temporaries.

Votes

Comments(0)