Registers
x86-64SIMD Registers (XMM / YMM / ZMM)
Overview of the x86-64 SIMD register file — XMM (128-bit, SSE), YMM (256-bit, AVX), and ZMM (512-bit, AVX-512) — and their aliasing relationship.
x86-64 includes a layered SIMD register file introduced across three instruction-set extensions: SSE (Streaming SIMD Extensions), AVX, and AVX-512.
Register hierarchy
| Extension | Width | Register names | Count |
|---|---|---|---|
| SSE (baseline on x86-64) | 128 bits | xmm0–xmm15 | 16 |
| AVX | 256 bits | ymm0–ymm15 | 16 |
| AVX-512 | 512 bits | zmm0–zmm31 | 32 |
Each wider register aliases the lower-width registers:
xmm3 = lower 128 bits of ymm3 = lower 128 bits of zmm3.
Writing xmm3 via a legacy SSE instruction (non-VEX-encoded) zeroes bits
128–255 of ymm3; writing via a VEX-encoded instruction only zeroes the
upper bits of ymm3 if the destination is written explicitly.
Lane interpretation
A single XMM register can hold:
| Type | Contents |
|---|---|
__m128i | 16 × i8, 8 × i16, 4 × i32, 2 × i64 |
__m128 | 4 × float (single-precision) |
__m128d | 2 × double (double-precision) |
__m256 | 8 × float |
Calling conventions
- System V:
xmm0–xmm7pass floating-point/vector arguments;xmm0–xmm1return values.xmm8–xmm15are caller-saved. - Microsoft x64:
xmm0–xmm3pass vector arguments;xmm6–xmm15are callee-saved.
Reverse-engineering notes
- Auto-vectorised loops emit
MOVDQU,PCMPEQB,PAND,PADDWetc. onxmmregisters. Decompilers often fail to reconstruct the high-level loop; reading the raw assembly is usually faster. MOVAPS/MOVAPDrequire 16-byte-aligned addresses;MOVUPS/MOVUPDdo not. Seeing an alignment fault with SSE is almost always a misalignedMOVAPSon a non-16-byte-aligned stack.xmm0is frequently used fordoubleandfloatreturn values — if a function returnsxmm0the return type is floating-point.vzeroupper/vzeroallappear before calling non-AVX code to avoid AVX–SSE transition penalties; their presence signals that the surrounding code uses YMM or ZMM registers.