The Formidable FormBook Form Grabber

by ASERT Team on

More and more we’ve been seeing references to a malware family known as FormBook. Per its advertisements it is an infostealer that steals form data from various web browsers and other applications. It is also a keylogger and can take screenshots. The malware code is complicated, busy, and fairly obfuscated--there are no Windows API calls or obvious strings. This post will start to explore some of these obfuscations to get a better understanding of how FormBook works.


The main sample used for this analysis is available on the forum or on VirusTotal. It is version 2.9 of FormBook. Two other samples are referenced as well:


Building Blocks

Let us start with three building blocks that will be used in later sections. First, most of FormBook’s data is stored encrypted in various locations--we’ll call these “encbufs”. Encbufs vary in size and are referenced with functions similar to this: This is a shellcoding technique to determine which address a piece of code is running at. In this example, the function will return 0x3380FAC. The encbuf will start two bytes after the returned address—jumping over the pop and retn instructions. Every encbuf starts with what looks like a normal x86 function prologue—push ebp; mov ebp, esp—but this is trickery. If you continue to disassemble this code, it quickly becomes gibberish: The second and third building blocks are decryption functions—decrypt_func1 and decrypt_func2 respectively. Decrypt_func1 iterates through the encrypted data and depending on the byte value copies a certain amount of data from certain offsets of the encrypted data to the plaintext data. Note: the length value passed to this function is the length of the plaintext. The length of the encrypted data isn’t explicitly stated. The other decryption function, decrypt_func2, can be broken up into three rounds: subtractions, RC4, and then additions. We’ve implemented both of these functions in a Python class, which can be found on GitHub.

Windows API Calls

All calls to the Windows API are done at run time via function name hashing. The hashing algorithm is CRC32, though it is not the CRC32B version as implemented in Python’s binascii.crc32 function. We used the Crc32Bzip2 function from the Python module crccheck to generate the correct values. Function names are converted to lowercase before hashing. For some of the API calls, the hashes are hardcoded into the code. An example would be 0xf5d02cd8, which resolves to ExitProcess. A listing of a bunch of Windows API function names and their corresponding hashes is available on GitHub. For other calls, the API hash is fetched from an encbuf using a convoluted mechanism that can be separated into two steps. First, the encbuf containing the hashes is decrypted. This requires two other encbufs, the decryption functions from above, and some SHA1 hashing:

The second step is specifying an index into the decrypted encbuf and decrypting the hash:

A listing of indexes, hashes, and their corresponding functions is available on GitHub. There are six additional API calls where the hashes are stored in a separate encbuf. We’ll point this encbuf out in another section, but they map to the following network related functions:

  • socket (0x46a9bfc5)
  • htons (0xe9cef9bb)
  • WSAStartup (0xa83c6f74)
  • send (0x6e3cd283)
  • connect (0x8c9cf4aa)
  • closesocket (0x4194fdf)

Strings Strings are obfuscated in two ways. Some of them are built a DWORD at a time on the stack:

The rest are stored in an encbuf. The strings encbuf is first decrypted using decrypt_func1. The decrypted encbuf contains an array of variable length encrypted strings, which can be represented like:

struct {
    BYTE size;
    BYTE *encrypted_string[size];

A particular string is referenced by an index number and is decrypted using decrypt_func2: A listing of the decrypted strings and their indexes are available on GitHub. Command and Control (C2) URLs The C2 URLs are stored in a “config” encbuf and the mechanism to get at them are convoluted and spread out over multiple functions. The first step of the mechanism is to figure out what process the injected FormBook code is running in. Depending on the injected process, a C2 index is saved and a 20-byte key is extracted from an encbuf and decrypted. Here is the snippet of code for when FormBook is running in explorer.exe:

Next, the config encbuf goes through a series of decryption steps:

Note: in the “Windows API Calls” section above we mentioned that six of the hashes were stored in a separate encbuf. In the screenshot above, we’ve made a comment where this encbuf comes from. The “decrypt_s205” function contains more decryption:

At this point, the config encbuf is decrypted, but the C2s are still obfuscated. Note that up to six C2s can be referenced depending on which process FormBook is running in. The final step is one more round of decryption using the process specific key from the first step:

Iterating through the possible C2 offsets and keys for the analyzed sample we get:

Initially we thought there would be up to six different C2s encrypted with different keys, but it’s just the same C2. This was the case for all the other samples we spot-checked as well. Decoy C2 URLs? While reviewing a sandbox run of a FormBook 3.0 sample, we noticed phone home traffic to multiple C2s: But when we extracted the C2s from its config encbuf we got a completely different C2:

Digging further into this, we noticed that starting in this version there were additional encrypted strings. These can be seen in this listing on GitHub. In particular starting at index 78 there are 64 seemly random domains (including the ones seen in our sandbox run). Tracing these strings in the code we see that 15 of these are randomly selected into an array and then one of them is randomly replaced with the C2 from the config encbuf. At a quick glance there doesn’t seem to be any overlap of these domains from sample to sample. They all seem to be registered, but by different entities. Some of them show up in search engines with benign looking data, but most return HTTP “page not found”s. For now, our theory is that these are randomly chosen decoy C2s. C2 Communications FormBook uses HTTP—both GETs and POSTs—for C2. An example of the initial phone home looks like this:

The query parameter “id” contains data encrypted with the following process: Unlike other parts of FormBook, the generation of the “comms_key” is easy—it boils down to the SHA1 digest of the C2. Using a Python snippet, the communications key for the analyzed sample can be generated like this: The “transform_b64_chars” function does the following character transformation:

  • + -> -
  • / -> _
  • = -> .

The encrypted data from above looks like this once decrypted:

FBNG:DDE857B32.9:Microsoft Windows XP x86:QWRtaW4=

It is mostly “:” delimited and consists of the following fields:

  • Message type (FBNG)
  • Bot ID and bot version (missing “:”) (DDE857B3 and 2.9)
  • Windows version
  • Base64 encoded username

Based on the leaked C2 panel code (see this forum thread) there are a few types of phone home messages:

  • KNOCK_POST – the initial phone home as shown above
  • RESULT_POST – results of a task
  • IMAGE_POST – screenshot
  • KEY_POST – key logger logs
  • Form logger logs

An example of an IMAGE_POST from another sample (version 2.6) looks like this:

There are three POST parameters:

  • dat – encrypted data
  • un – base64 encoded username
  • br – web browser identifier

The encrypted data is decrypted as above (using www[.]wowtracking[.]info/list/hx47/ as the C2 key component) and the first 100 decrypted bytes look like this:

Here we can see:

  • Message type (FBIMG)
  • Bot ID (DDE857B3)
  • And the start of a JPEG file

The JPEG file shows a screenshot of one of ASERT’s finest sandboxen:


FormBook is an infostealing malware that we’ve been seeing more and more of recently. This post has been an analysis of some of the obfuscations used in the FormBook malware to start getting an understanding of how it works. Based on samples in our malware zoo and search engine results, it seems to have gotten its start sometime in early 2016. With a cheap price tag (a few hundred dollars), general availability (for sale on Hack Forums), and a supposed release of a “cracked builder” there are quite a few FormBook samples and campaigns in the wild and we only expect to see more.

Posted In
  • Analysis
  • Botnets
  • Encryption
  • Interesting Research
  • Malware
  • Reverse Engineering
  • threat analysis