On the road of hiding… PEB, PE format handling and DLL loading homemade APIs – part 1

For some strange reasons, I decided to start my road into the malware/reversing world by rewriting the four holy APIs used in importing functions from DLLs, i.e. GetModuleHandle, GetProcAddress, the mighty LoadLibrary and finally FreeLibrary. If any of you don’t know/remember what and how useful they are, here’s a quick lesson/refresh:

  • LoadLibrary: loads a new DLL/EXE module and returns the address it has been loaded
  • GetModuleHandle: retrieves the address where a DLL or an EXE file has already been loaded
  • GetProcAddress: returns the address of an exported function/variable from an already loaded DLL
  • FreeLibrary: the reverse of LoadLibrary, i.e. frees a loaded library from the memory

After knowing this and that every time an application calls a static linked function imported from DLL, this one is listed into the imported function table of the application itself, it is quite clear how useful re-implementing them is for a malware program. Having suspicious function imported (see CreateRemoteThread and other ones) might be an signal alarm for an anti-virus or for any other application monitoring the system; another point would be hiding the load DLL from the publicly visible list of the imported modules for the application.

My search started by the most “simple”-to-be-rewritten of the three functions: GetModuleHandle. After a few studies on this, I discovered that there is an interesting structure held by the operating system and associated to each process called Process Environment Block, shortened as PEB. It’s not the first time I meet this structure, but I don’t think I ever analyzed that. On MSDN it is poorly documented and it is stated that “this structure may be altered in future versions of Windows”: I really hope this is not going to happen because, if it does, my implementation (and many others applications which relies on this structure) will be totally broken. Anyway, the structure is defined as follows for 32-bit systems:

typedef struct _PEB
    BYTE Reserved1[2];
    BYTE BeingDebugged;
    BYTE Reserved2[1];
    PVOID Reserved3[2];
    BYTE Reserved4[104];
    PVOID Reserved5[52];
    BYTE Reserved6[128];
    PVOID Reserved7[1];
    ULONG SessionId;

On 64-bit systems it is defined as:

typedef struct _PEB {
    BYTE Reserved1[2];
    BYTE BeingDebugged;
    BYTE Reserved2[21];
    PPEB_LDR_DATA LoaderData;
    BYTE Reserved3[520];
    BYTE Reserved4[136];
    ULONG SessionId;
} PEB;

A lot of the fields (the most of ’em) are declared as reserved because they are “for internal use by the operating system”. Anyway, about the remaining one we have a pretty good documentation (official and/or reversed). It is possible to have a complete documentation of it at NTinternals:

  • BeingDebugged: a flag set to true when the related process is being debugged
  • Ldr: the interesing one for me, it is a pointer to a PPEB_LDR_DATA structure containing information about the modules loaded in a process
  • ProcessParameters: it contains information about the command line and the path of the executable which the process is related to. It is a pointer to a PRTL_USER_PROCESS_PARAMETERS structure and the main fields in this structure are the unicode strings ImagePathName containing the path of the image file and CommandLine containing all the parameters passed to the process
  • PostProcessInitRoutine: a soon-to-be deprecated field kept only for compatibility with Windows 2000, as it was used to initialize the application compatibility layer. The callback is called between the DLL initialization routines and the executable control transfer; using this callback function should be definitely avoided because it can cause problems
  • SessionId: the Terminal Services session identifier associated.

As already said, the interesting field is only the Ldr one. The PPEB_LDR_DATA is defined as following:

typedef struct _PEB_LDR_DATA
    BYTE Reserved1[8];
    PVOID Reserved2[3];
    LIST_ENTRY InMemoryOrderModuleList;

The only field documented in MSDN is the last one, InMemoryOrderModuleList, a double-linked list containing all the modules loaded into a process. The LIST_ENTRY structure is defined in the following way:

typedef struct _LIST_ENTRY
    struct _LIST_ENTRY *Flink;
    struct _LIST_ENTRY *Blink;

So, when I first saw this structure, I told to myself “OK, OK, there must be an error. Where are the interesting fields? Just two pointers to the next and the previous element of the list, which is anyway not carrying any kind of information?“. Luckily, it’s not like this, even if this implementation is pretty unusual, at least for me. In fact, this structure is “just” part of a bigger structure called LDR_MODULE: using the pointers and knowing how the structure is made (thanks again NTinternals), I can browse through all the information I need.

typedef struct _LDR_MODULE
    LIST_ENTRY InLoadOrderModuleList;
    LIST_ENTRY InMemoryOrderModuleList;
    LIST_ENTRY InInitializationOrderModuleList;
    PVOID BaseAddress;
    PVOID EntryPoint;
    ULONG SizeOfImage;
    ULONG Flags;
    SHORT LoadCount;
    SHORT TlsIndex;
    LIST_ENTRY HashTableEntry;
    ULONG TimeDateStamp;

So I had a pointer to the second entry of this structure: having a pointer to the LDR_MODULE structure would just require subtracting the size of two LIST_ENTRYs. There’s a big thing to keep in mind about the LIST_ENTRY organization: as it can be easily seen, it’s en antry of a circular double linked list and the head of the list can be accessed by storing the Flink pointer of InMemoryOrderModuleList. Once had this pointer, it’s possible to loop (and obtain each time the LDR_MODULE structure) until the head of the list is reached again.

This structure is a briefcase full of information:

  • BaseAddress: the HMODULE value that should retrieved with the GetModuleHandle
  • EntryPoint: the name says it all
  • SizeOfImage: the size of memory allocated to store all the section, rounded to the next 4Kb multiple
  • FullDllName / BaseDllName: the full path / name-of-the-file of the loaded module

Problem: how do I retrieve the PEB structure? Pretty simple, there are two ways to do this:

  • Using the exported function from ntdll.dll called NtQueryInformationProcess, which has to be imported at runtime because it is not exported. This method can be easily monitored by the operating system, so it is usually avoided
  • Using a little bit of assembly code

The NtQueryInformationProcess function has the following prototype (from MSDN):

NTSTATUS WINAPI NtQueryInformationProcess(
  __in       HANDLE ProcessHandle,
  __in       PROCESSINFOCLASS ProcessInformationClass,
  __out      PVOID ProcessInformation,
  __in       ULONG ProcessInformationLength,
  __out_opt  PULONG ReturnLength

Before calling this function I have to open the process before with a call to the OpenProcess function asking for read rights. Passing the ProcessBasicInformation enum value as an argument for the ProcessInformationClass parameter, I get the PEB address by looking into the PROCESS_BASIC_INFORMATION structure returned by the function (it has a field called PebBaseAddress). Reading the PEB can be easily done with a call to the ReadProcessMemory API.

The other method relies on the Thread Information Block, a pretty complicated most-undocumented structure viewable on Wikipedia (maybe I’ll post something about it). Anyway at FS:[0x30] there is the PEB. Getting this is possible using assembly:

    mov ebx, fs:[0x30]
    mov peb, ebx

or, for x64 systems:

    mov rbx, gs:[0x60]
    mov peb, rbx

With this information, is pretty simple to retrieve the base address of a loaded module with a so simple algorithm:

  1. Obtain the PEB
  2. Get the InMemoryOrderModuleList.Flink and store it as a head entry
  3. Get the head of the LDR_MODULE by subtracting a couple of LIST_ENTRYs
  4. Compare each time the BaseDllName field with the name of the module I am looking for
  5. If it’s the same, return the BaseAddress value, if not, get the next entry by accessing to Flink and go to the step #2

What if the module isn’t found? If I was just writing a custom implementation of the GetModuleHandle, this would mean that the module hasn’t already been loaded; as I am writing a custom LoadLibrary too (I really hope, as I’m getting some problems in doing that, keep the faith bro), which won’t store the loaded modules there, I’ll have to look in an internal DB storing the modules I load each time. This can be easily done with a map or a list.

In my implementation, the GetModuleHandle doesn’t require the name of the module as an argument, but its hash: in this way I don’t store any string (as an trivial anti-debugging measure), so, everytime I have to compare this value with the BaseDllName, I have to compute the hash of this field too.

I have to break here the article, as I decided to split it into three (or maybe four) parts. I almost finished rewriting the LoadLibrary, but, as I am working only during the evenings on this, it could take more than expected. I get some problems with Windows 7 DLL loading and there are a lot of bugs yet-to-be-fixed. Hope you liked it and, in case I missed something, don’t exitate to comment the post and I’ll fix it.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s