Skip to content
Command & Controlintermediate

Domain Generation Algorithm (DGA)

Malware algorithmically generates large numbers of candidate C2 domains from a seed so that takedowns cannot keep pace, while only a handful are ever registered by the operator.

Rather than hard-coding one or two command-and-control (C2) domains that a defender can simply sinkhole, malware uses a Domain Generation Algorithm to produce hundreds or thousands of pseudo-random candidate domains per day. The bot tries each in turn; the operator only needs to register a tiny subset ahead of time. Because the algorithm is deterministic, both bot and operator arrive at the same list independently, with no shared infrastructure that could be seized.

DGAs are usually seeded with a value both sides can derive — most commonly the current date, sometimes combined with a hard-coded constant or an external source such as a trending Twitter topic or a currency exchange rate (a "time-independent" or seed-from-the-wild DGA). The seed makes the domain list rotate, defeating static blocklists.

For an analyst the practical task is twofold: recognise that a sample contains a DGA, and reverse the algorithm well enough to pre-compute (and pre-emptively sinkhole) tomorrow's domains.

How it works

A typical date-seeded DGA hashes the date, expands it into a pseudo-random stream, and maps that stream onto a fixed alphabet and TLD set:

c
// Illustrative date-seeded DGA — for analysts who must reverse one.
// Real families vary the PRNG and the label-to-domain mapping.
void generate_domains(int year, int month, int day, char out[][32], int count)
{
    uint32_t seed = (year * 10000u + month * 100u + day) ^ 0xABCD1234u;
    const char *alphabet = "abcdefghijklmnopqrstuvwxyz";
    const char *tlds[] = { ".com", ".net", ".org", ".info" };

    for (int d = 0; d < count; d++) {
        // simple LCG step per domain
        seed = seed * 1103515245u + 12345u;
        uint32_t r = seed;
        int len = 12 + (r % 8);              // labels 12–19 chars
        char label[32]; int i;
        for (i = 0; i < len; i++) {
            r = r * 1103515245u + 12345u;
            label[i] = alphabet[r % 26];
        }
        label[i] = '\0';
        // concatenate label + TLD chosen from the rotating stream
        sprintf(out[d], "%s%s", label, tlds[(r >> 8) % 4]);
    }
}

The hallmark for a reverser is a function that consumes a date (often via GetSystemTime/time()) and emits a loop of gethostbyname/getaddrinfo/DnsQuery calls against high-entropy names. Many families embed the seed constant and TLD list as the only "strings" worth recovering.

Detection & analysis

Static analysis:

  • Look for a tight loop combining a small PRNG (LCG, XorShift, or a hash like MD5/SHA-1 truncated to a few bytes) with character-table indexing, terminated by a DNS-resolution API. The presence of a hard-coded alphabet and a short TLD array next to date-formatting calls is a strong tell.
  • Once the algorithm is recovered, re-implement it and generate the domain set for a date range. Cross-reference against passive DNS to find which generated domains were ever registered — those are the live or historical C2.
  • Tools such as DGArchive and academic DGA classifiers maintain reversed implementations for hundreds of families; matching your candidate output against them can identify the family instantly.

Dynamic analysis:

  • Run the sample with the host clock fixed to several dates and capture DNS traffic. A DGA reveals itself as a burst of NXDOMAIN responses — the bot walks dozens of unregistered names before (or instead of) hitting a live one.
  • The ratio of failed (NXDOMAIN) to successful resolutions, and the high character entropy of the queried labels, distinguishes algorithmic names from human-chosen ones.
  • Pivot on any name that does resolve: that registered domain, plus its passive-DNS history and registrant data, maps the active C2.

Detection rule hint:

Hunt for hosts producing a high count of NXDOMAIN responses in a short window (e.g. >50 distinct unresolved second-level domains in 10 minutes) where the queried labels exhibit high Shannon entropy and low n-gram likelihood against English/registered-domain corpora — this combination is the canonical DGA signature and almost never occurs in benign browsing.

Votes

Comments(0)