What is the PE file format?

PE (Portable Executable) is the file format Windows uses for .exe, .dll, .sys, and .ocx files. It describes how the loader should map code and data into memory, resolve imported functions, and start execution at the entry point.

What is the difference between an RVA and a file offset?

A file offset is a byte position on disk. An RVA (Relative Virtual Address) is an offset from the image base after the file is mapped into memory. Because sections are aligned differently on disk and in memory, the two values rarely match and must be translated through the section table.

Why do packers and malware modify the PE headers?

Packers compress or encrypt the real code and rewrite the entry point so a small stub runs first, unpacks the payload, and jumps to the original entry point (OEP). They also rebuild or hide the Import Address Table to defeat static analysis.

What is the Import Address Table (IAT)?

The IAT is an array of pointers the Windows loader fills in with the real addresses of functions imported from other DLLs. Reading it tells an analyst which APIs a binary calls.

PE File Format Explained: Portable Executable

The Portable Executable (PE) format is the container Windows uses for almost every piece of native code it runs: .exe, .dll, .sys, and .ocx files all share the same structure. If you do any reverse engineering on Windows, the PE file format is the map you read before anything else. This guide walks through every major part of that map and explains why malware authors and packers spend so much effort distorting it.

A bird's-eye view

A PE file is a sequence of headers followed by a table of sections. Each header points to the next, and the section table tells the loader where the actual code and data live. Here is the overall layout:

text

+-----------------------------+  file offset 0
|  DOS header (MZ) + e_lfanew |
+-----------------------------+
|  DOS stub ("This program...")|
+-----------------------------+  <- e_lfanew points here
|  PE signature  "PE\0\0"     |
+-----------------------------+
|  File header (COFF)         |
+-----------------------------+
|  Optional header            |
|   + data directories[16]    |
+-----------------------------+
|  Section table              |
|   .text  .rdata  .data ...  |
+-----------------------------+
|  Section bodies (raw data)  |
|   .text  | .rdata | .data   |
|   .rsrc  | .reloc | overlay |
+-----------------------------+

The genius of the format is that it is built to be mapped, not read sequentially. The loader copies sections into memory at aligned addresses and patches a few tables. Understanding that mapping is the difference between guessing and knowing.

The DOS header and stub

Every PE file still starts with a 64-byte MS-DOS header whose first two bytes are MZ (the initials of Mark Zbikowski). It exists purely for backward compatibility, but two fields matter:

e_magic — the MZ signature that lets tools recognize the file at a glance.
e_lfanew — a 4-byte offset (at the end of the DOS header) pointing to the real PE headers.

After the DOS header comes the DOS stub, a tiny 16-bit program that prints "This program cannot be run in DOS mode" if someone runs the file on actual DOS. Modern loaders ignore it and jump straight to e_lfanew. Malware often hides data in or after the stub because nothing validates it.

NT headers: signature, File header, Optional header

At e_lfanew sits the IMAGE_NT_HEADERS structure, which has three parts.

PE signature

Four bytes: PE\0\0. If they are missing, the loader rejects the file.

File header (COFF header)

A compact 20-byte structure describing the machine and overall shape:

Machine — target CPU (e.g. 0x8664 for x64).
NumberOfSections — how many entries the section table has.
Characteristics — flags such as "is a DLL" or "executable image."
SizeOfOptionalHeader — how big the next header is.

Optional header

Despite the name, the Optional header is mandatory for executables. It carries the fields the loader actually needs:

Magic — 0x10B for PE32, 0x20B for PE32+ (64-bit).
AddressOfEntryPoint — the RVA where execution begins.
ImageBase — the preferred load address.
SectionAlignment and FileAlignment — the two granularities that make RVAs and file offsets diverge.
DataDirectory[16] — an array of (RVA, size) pairs pointing to special tables.

Data directories

The 16 data directory entries are shortcuts to important tables scattered across the sections. The ones a reverse engineer reaches for most are:

Import Directory — DLLs and functions the binary imports.
Export Directory — functions a DLL makes available to others.
Resource Directory — icons, dialogs, manifests, and arbitrary blobs (see PE resource payloads).
Base Relocation Directory — fixups applied when the image cannot load at its preferred base.
TLS Directory — thread-local storage and, importantly, TLS callbacks that run before the entry point.

The section table and common sections

After the Optional header comes the section table: one 40-byte IMAGE_SECTION_HEADER per section. Each entry maps a chunk of the file to a region of memory and sets its permissions. The names are conventions, not rules:

.text — executable code. Read + execute.
.rdata — read-only data: constants, the import tables, debug info.
.data — initialized, writable global variables.
.rsrc — resources described by the resource directory.
.reloc — base relocation fixups.

Each header carries VirtualAddress and VirtualSize (where it lands in memory) plus PointerToRawData and SizeOfRawData (where it lives on disk). Those two pairs are almost never equal, which brings us to the single most important concept in PE analysis.

RVA versus file offset

A file offset is a byte position on disk. An RVA (Relative Virtual Address) is an offset from ImageBase after the loader maps the image. Because sections are aligned to FileAlignment (often 512 bytes) on disk but SectionAlignment (often 4096 bytes) in memory, the same byte has two different addresses.

To translate an RVA to a file offset you find the section that contains the RVA, then apply:

text

file_offset = RVA - section.VirtualAddress + section.PointerToRawData

Every tool that lets you click a virtual address and see the bytes on disk is doing exactly this. Get it wrong and you read garbage — which is precisely the confusion some packers engineer on purpose.

Imports, the IAT, and exports

When a program calls CreateFileW, it does not know that function's address at compile time. The linker instead records, in the import directory, that it needs CreateFileW from kernel32.dll. Two parallel arrays describe this: the Import Name Table (names/ordinals) and the Import Address Table (IAT).

At load time the Windows loader walks the imports, loads each DLL, resolves every function, and writes the real addresses into the IAT. Code then calls indirectly through the IAT. For an analyst this table is a confession: it lists almost everything the binary can do. That is exactly why attackers target it with import table obfuscation, resolving APIs dynamically so the import directory looks empty.

The export directory is the mirror image. A DLL publishes named or ordinal-numbered functions so other modules can import them, mapping each name to an RVA inside the module.

The entry point and the OEP

AddressOfEntryPoint is the RVA the loader jumps to after mapping and import resolution finish (and after any TLS callbacks). In a normal program it points into .text at the C runtime startup code.

In a packed program it points instead into a small unpacking stub. That stub decompresses or decrypts the real code, rebuilds the IAT, and finally jumps to the Original Entry Point (OEP) — the address the program would have started at before packing. Finding the OEP is the central puzzle of unpacking. The broader family of these tricks is covered under packing techniques.

Why malware and packers manipulate all of this

Almost every defensive distortion you will meet is a deliberate edit to one of the structures above:

Rewriting the entry point to run a stub first, then jump to the OEP.
Shrinking or faking the IAT so static tools can't see which APIs are used.
Abusing TLS callbacks to execute code before any debugger breakpoint at the entry point fires.
Hiding payloads in resources or appending an overlay — data tacked on after the last section that the loader never maps but the program reads itself (see PE overlay payloads).
Lying about section sizes and alignment to break naive RVA-to-offset math.

Once you can read the headers cleanly, these tricks stop being magic and become a checklist.

Where to go next

The PE format rewards hands-on practice. Open a small executable in a PE viewer and trace each field from the DOS header down to the IAT. Then look at how those same fields get weaponized in the full techniques library, and keep the reverse-engineering glossary open for the acronyms.

Ready to go deeper? Pick a binary, map its headers by hand, and then dig into our techniques library to see how each structure gets bent, hidden, or abused in the wild.