Skip to content

Registers

x86-64

SIMD Registers (XMM / YMM / ZMM)

Overview of the x86-64 SIMD register file — XMM (128-bit, SSE), YMM (256-bit, AVX), and ZMM (512-bit, AVX-512) — and their aliasing relationship.

x86-64 includes a layered SIMD register file introduced across three instruction-set extensions: SSE (Streaming SIMD Extensions), AVX, and AVX-512.

Register hierarchy

ExtensionWidthRegister namesCount
SSE (baseline on x86-64)128 bitsxmm0xmm1516
AVX256 bitsymm0ymm1516
AVX-512512 bitszmm0zmm3132

Each wider register aliases the lower-width registers: xmm3 = lower 128 bits of ymm3 = lower 128 bits of zmm3.

Writing xmm3 via a legacy SSE instruction (non-VEX-encoded) zeroes bits 128–255 of ymm3; writing via a VEX-encoded instruction only zeroes the upper bits of ymm3 if the destination is written explicitly.

Lane interpretation

A single XMM register can hold:

TypeContents
__m128i16 × i8, 8 × i16, 4 × i32, 2 × i64
__m1284 × float (single-precision)
__m128d2 × double (double-precision)
__m2568 × float

Calling conventions

  • System V: xmm0xmm7 pass floating-point/vector arguments; xmm0xmm1 return values. xmm8xmm15 are caller-saved.
  • Microsoft x64: xmm0xmm3 pass vector arguments; xmm6xmm15 are callee-saved.

Reverse-engineering notes

  • Auto-vectorised loops emit MOVDQU, PCMPEQB, PAND, PADDW etc. on xmm registers. Decompilers often fail to reconstruct the high-level loop; reading the raw assembly is usually faster.
  • MOVAPS / MOVAPD require 16-byte-aligned addresses; MOVUPS / MOVUPD do not. Seeing an alignment fault with SSE is almost always a misaligned MOVAPS on a non-16-byte-aligned stack.
  • xmm0 is frequently used for double and float return values — if a function returns xmm0 the return type is floating-point.
  • vzeroupper / vzeroall appear before calling non-AVX code to avoid AVX–SSE transition penalties; their presence signals that the surrounding code uses YMM or ZMM registers.