x86 Assembly 101: when the basis crumble and you find yourself in the middle of nowhere…

Just a few days ago I decided to give a look again at the disasm of the good old Cascade virus (somebody still recalls it? when they were just called “viruses” or “virii” pour les elitaires). Man, oh man, what a piece of malware was that (it fit in 1704 bytes)… Letters falling in the DOS shell, COM attaching and INT hooks in an ASM code written directly in ASM: I guess that writing them in C would have resulted in a waste of bytes (prologues, epilogues and some other stuff).

Then I decided to give a look at some more “modern” ASM code, i.e. some code generated from C code and I realized I was totally out of training (how the hell was the stack frame working?), as I studied some of this only at the university and some by myself long time ago, and, y’know, when you don’t practise that stuff for a while, it goes away like pieces of paper in the wind from you. So, as the botnet project is a little bit stuck (sadly, due to a fundamental issue in the referal paper itself), I decided I could take some notes about what I already know about x86 ASM and what I’m studying now, and put them here for fast reference for myself and for whoever could profit from this.

So, back in the days when 16-bit computers were around, the processors had 8 general purposes registers: AX (accumulator), BX (base index), CX (counter), DX (data), SI (source index), DI (destination index), BP (base pointer) and SP (stack pointer). The names (accumulator, base, counter, etc…) are actually just suggestions made by Intel to the developers: the registers don’t have to be necessarily used in this way.

In addition there were the FLAGS register (which was actually a special register, holding the information about the result of the instructions and other stuff that will be seen later) and the IP (instruction pointer) register, holding the address of the next instruction to be executed (can’t be explicitely changed by the code).

Anyway, of course, all of these registers were 16-bit large: it was possible to access to the higher 8 bits of the register of AX (for example) by using the AH register and to the lowest 8 bits by using the AL register (this is valid only for AX, BX, CX and DX).

When 32-bit arrived with 80386, all those registers were expanded to 32-bit and an “E” (Extended) was added to all of them: EAX, EBX, ECX, …, EIP, EFLAGS. With the 32-bit nowadays it is still possible to use the old names of the registers to access to the lowest 16 bits of the EAX register, for example (again, this is valid for EAX, EBX, ECX and EDX).

x86 ASM instructions are usually made by an opcode and one or two operands: the opcode is an instruction name and the operands are the parameters passed to the instruction itself. An operand can be:

  • A register name (one of the general purpose registers seend before)
  • An immediate (a constant value)
  • A memory address expressed in r/m32 form (seen later)

MOV instruction

According to the statistics, the most common instruction in x86 ASM is the MOV instruction (35% of occurrences… not bad!): it takes two operands, called source and destination, and just copies data from the source to the destination. The copy can be from register to register, from memory to register and vice-versa, from immediate to register and from immediate to memory. Please note that only one (and not both) of the operands can contain a memory address: this means that it’s impossible to copy directly from memory to memory. Below there’s the “syntax” of the MOV instruction:

MOV [Destination], [Source]

A simple example of this could be:

MOV EAX, 4

which moves the value 4 into the register EAX, or the following:

MOV EAX, EBX

that moves the content of the EBX register into EAX.

Simple, huh?

The stack

Before talking about the following instruction, it’s necessary to introduce the stack. The stack is a temporary storage area in the RAM assigned to every process when it’s initialized by the OS. It’s designed as a LIFO data structure, in which it’s possible to push data on the top and to pop them off the top. The current top of the stack is stored inside the ESP register and, by convention, it grows toward lower memory addresses: this means that, everytime a new item is pushed into the stack, the ESP register “grows” to a lower value (obviously, when an item is popped out from the stack, ESP “decreases” toward higher values).

While ESP points at the top of the stack, the EBP register tells us where the stack frame begins: the stack frame is a portion of the stack that is dedicated to a particular function. Every time a new function is called, a new stack frame is created, and, when the function returns, the frame is destroyed (different ways to do this will be analyzed later). The stack is usually used to store the local variables and to pass arguments to functions to be called.

PUSH instruction (and her sister POP)

The second most popular instruction is the PUSH instruction (10% of occurrences), that is used to push a new 32-bit element (DWORD) in the stack: its side effect is the decrease of the ESP register of 4 (this is done before reading the stack content). The element can be a register, a memory address or, as usual, an immediate. The syntax is the following:

PUSH [Operand]

For example, let’s say this is the current situation in the stack:

EBP:  0x00105000
ESP:  0x00104FF8
EAX:  0x0000000F

                       ADDRESS
STACK:   ^   EBP ->   0x00105000:  0x0000FF00
         |            0x00104FFC:  0x004312CD
         |   ESP ->   0x00104FF8:  undefined
         |            0x00104FF4:  undefined
                      ...

and that the instruction to be executed is a PUSH EAX. After the execution, the resulting situation will be:

EBP:  0x00105000
ESP:  0x00104FF4
EAX:  0x0000000F

                       ADDRESS
STACK:   ^   EBP ->   0x00105000:  0x0000FF00
         |            0x00104FFC:  0x004312CD
         |            0x00104FF8:  0x0000000F
         |   ESP ->   0x00104FF4:  undefined
                      ...

I know this can look like pretty messy 😦 and I hope I wrote everything in the correct way.

The complementary instruction is the POP one (which holds “only” the 4% of occurrences): if it’s the complementary, shouldn’t that hold the same number of occurrences? Actually this is not the case, as the PUSH instruction is also used to pass arguments to the functions and those ones are not popped out from the stack with a POP. This instruction has only one operand, that can be a register or a memory address, where the popped item is going to be stored. The side effect, of course, is the increment of ESP of 4. The element is not actually “removed” from the memory of the stack, but its content can be considered now as undefined and, anyway, its location available for new items.

x86 calling conventions

Then again, before talking about the next instruction, it’s important to understand how functions can be called in x86 ASM. There are actually a lot of calling conventions in x86, but the most common ones are the cdecl and the stdcall: the main difference between the two of them is the responsible of the stack cleaning.

The cdecl convention is usually the default one: the arguments are passed in right-to-left order and the caller is responsible for the stack cleaning. What the caller does before calling the function is:

  • Save the old stack frame pointer
  • Create a new stack frame pointer
  • Push the parameters on the stack
  • Call the function
  • Clean up the stack

The results of the function is stored in EAX or, eventually, in EDX:EAX.

The stdcall convention, instead, pushes the parameters left-to-right and lets the callee clean the stack: this is the default choice for Microsoft Win32 APIs. Again, the results is stored in EAX or EDX:EAX.

CALL instruction (and the complementary RET)

The next most common instruction is CALL, with the 6% of occurrences. This ones transfers the control to a different subroutine, but, before that, it pushes the address of the next instruction to be executed on the stack in such a way that it’s possible to resume the execution from the point where the subroutine was called. Transferring the control to the chosen subroutine is easy for the CALL instruction: it just changes the value of the EIP register in order to point to the beginning of the specified subroutine.

The destination address can be specified in an absolute way or in a relative way (the count starts from the end of the CALL instruction). The syntax is the following:

CALL [Address]

Once the subroutine has finished its task and it’s time to get back to the original flow, the RET instruction is used. This instruction accepts an optional parameter, which is an immediate to be added to the ESP register. Anyway, what the RET instruction does is popping the top of the stack into EIP (the value was pushed by the CALL):

RET

If an immediate is specified, as said, it is added to the ESP register:

RET [Immediate]

When performing a call to a subrouting, the caller should save the contents of the registers EAX, ECX and EDX, which are defined as caller-saved registers, as the subroutine might change them. Then, of course, the caller should push the needed parameters to the stack according to the convention used and the perform the CALL instruction.

After the routine returns, if the cdecl convention is on, the caller removes the parameters from the stack increasing the value of ESP accordingly; in addition the caller-saved registers are restored.

On the callee side, the first thing that must be performed is pushing the value of EBP on the stack and copy the value of ESP in EBP: in this way the base of the old stack frame is saved and a new stack frame is created. Next, the needed space for the local variables must be reserved in the stack (can be skipped if no local variables are used in the function). Last step is saving the value of the so-called callee-saved registers: EBX, ESI and EDI (EBP and ESP have been already saved). This phase of the function is also called prologue.

At the end of the routine:

  • the return value must be stored in EAX
  • the local variables must be deallocated
  • right before returning, the base of the old stack frame must be restored by popping the value of EBP from the stack
  • at last, the RET instruction is issued

This step is also called epilogue.


To check all of these things together, it’s possible to use a very simple application like the following one:

int simple_function()
{
    return 1;
}

int main(void)
{
    return simple_function();
}

The first time I tried to compile this simple application on Microsoft VisualStudio, I got a lot of messy code with indirect calls and stack integrity checking. Man, what a mess. The solution was to turn down all the optimization that are enabled by default for the Microsoft compiler:

  • Basic Runtime Checks set to “Default”
  • Security Check set to “Disable Security Check (/GS-)
  • Enable Incremental Linking set to “No (/INCREMENTAL:NO)

Definitely better now. The x86 ASM code I obtained for that piece of code was:

_simple_function:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  mov         eax,1
00401008  pop         ebp
00401009  ret
...
_main:
00401010  push        ebp
00401011  mov         ebp,esp
00401013  call        0401000
00401018  pop         ebp
00401019  ret

And this code is definitely what I was expecting to have. The main function is, of course, the first one called by the OS: the first thing it does, of course, is to create a new stack frame by saving the EBP register and moving the ESP content into EBP. Right before the “CALL 0x00401000” instruction, the stack and the registers look like the following:

EBP:  0x0012FF30
ESP:  0x0012FF30

                        ADDRESS
STACK:   ^ EBP, ESP ->  0x0012FF30:  0x0012FF80
         |              0x0012FF2C:  undefined
         |              0x0012FF28:  undefined
         |              0x0012FF24:  undefined
                        ...

Of course, right after the call execution, the situation will change into:

EBP:  0x0012FF30
ESP:  0x0012FF2C
EIP:  0x00401000

                        ADDRESS
STACK:   ^      EBP ->  0x0012FF30:  0x0012FF80
         |      ESP ->  0x0012FF2C:  0x00401018
         |              0x0012FF28:  undefined
         |              0x0012FF24:  undefined
                        ...

In fact, the address of the next instruction to be executed (the “POP EBP” at 0x00401018) has been pushed onto the stack and the EIP value points now at the beginning of simple_function. This one now creates a new stack frame, and the situation in the stack evolves into the following:

EBP:  0x0012FF28
ESP:  0x0012FF28

                        ADDRESS
STACK:   ^              0x0012FF30:  0x0012FF80
         |              0x0012FF2C:  0x00401018
         | EBP, ESP ->  0x0012FF28:  0x0012FF30
         |              0x0012FF24:  undefined
...

Successively, the return value 1 is moved into EAX and the old stack frame base pointer is restored in EBP with a “POP EBP”. Right before the “RET” execution, this is the stack situation:

EBP:  0x0012FF30
ESP:  0x0012FF2C

                        ADDRESS
STACK:   ^      EBP ->  0x0012FF30:  0x0012FF80
         |      ESP ->  0x0012FF2C:  0x00401018
         |              0x0012FF28:  undefined
         |              0x0012FF24:  undefined
...

Then again, the “RET” instruction will retrieve the return address from the stack by popping it, and so on, until the end of the application. This was of course an extremely simple example, without arguments to be passed to the function or local variables.


The little piece code can be modified in order to use local variables and arguments: the following piece is so obtained:

int simple_function(int arg1)
{
    int local_var = arg1;
    return local_var;
}

int main(void)
{
    int ret_value = 1;
    return simple_function(ret_value);
}

and the following Assembly code is generated:

_simple_function:
00401000  push        ebp
00401001  mov         ebp,esp
00401003  push        ecx
00401004  mov         eax,dword ptr [ebp+8]
00401007  mov         dword ptr [ebp-4],eax
0040100A  mov         eax,dword ptr [ebp-4]
0040100D  mov         esp,ebp
0040100F  pop         ebp
00401010  ret
...
_main:
00401020  push        ebp
00401021  mov         ebp,esp
00401023  push        ecx
00401024  mov         dword ptr [ebp-4],1
0040102B  mov         eax,dword ptr [ebp-4]
0040102E  push        eax
0040102F  call        00401000
00401034  add         esp,4
00401037  mov         esp,ebp
00401039  pop         ebp
0040103A  ret

A few changes are immediately visible: some new “push” instructions and those strange brackets here and there. The brackets are used for the aforementioned r/m32 form and they work, more or less, like the pointers in C/C++: the value to be used is the one located at the memory address specified inside the brackets. The instruction

MOV EAX, DWORD PTR [EBP-4]

will read 4 bytes (the DWORD parameter specified) from the memory location EBP-4 and put the result into EAX. An alternative way of writing the previous instruction is

MOV EAX, [EBP-4]

This is still valid because the number of bytes to be read is taken from the size of the first operand: 32 bits. If the first operand would be just AX, then only 2 bytes would be read from the memory location.

The r/m32 is used also to write into a specified memory location, just like the instruction

MOV DWORD PTR [EBP-4], 1

Where an immediate is written into the memory address EBP-4.

As can be seen, simple operations are possible inside the brackets (like using an immediate as offset): the possible choices as r/m32 are the following:

  • MOV EAX, base
  • MOV EAX, [base]
  • MOV EAX, [base+index*scale]
  • MOV EAX, [base+index*scale+displacement]

Where base can be whichever of the general purposes registers, index is one of the EAX, EBX, ECX, EDX, EBP, ESI or EDI registers, scale can be 1, 2, 4 or 8 and displacement can be a 16/32-bit number.

All the memory access in the example program are performed with offsets relative to the EBP register. As the EBP is the base address for the current stack frame and as the latter grows towards lower memory addresses, it is possible to define the following general rule: whenever the memory access is of the form [EBP+…], it’s about a parameter of the function, otherwise it’s a local variable [EBP-…].

So, what about the “PUSH ECX” at 00401023? Well, I don’t know (yet) why the ECX register has been chosen (I’d say whichever register would fit), but it’s just a way to create some space in the stack for the local variable ret_value in the main function: in fact there’s not a “POP ECX” around, as we were actually not interested into the content of the ECX register. The thing is confirmed by the MOV instruction at 00401024, that moves the immediate 1 into the stack cell created by the PUSH ECX instruction. The situation in the stack, right after the execution of the MOV instruction is:

EBP:  0x0012FF30
ESP:  0x0012FF2C

                        ADDRESS
STACK:   ^      EBP ->  0x0012FF30:  0x0012FF80
         |      ESP ->  0x0012FF2C:  0x00000001
         |              0x0012FF28:  undefined
         |              0x0012FF24:  undefined

At 0x0040102B the MOV instruction will load what is at [EBP-4] into the EAX register and, as EBP-4 = 0x0012FF2C, the EAX will have the value 1. Then, EAX will be pushed as an argument onto the stack and, after the call to the simple_function function and the creation of its own stack frame, the stack will look like:

EBP:  0x0012FF20
ESP:  0x0012FF20

                        ADDRESS
STACK:   ^              0x0012FF30:  0x0012FF80
         |              0x0012FF2C:  0x00000001
         |              0x0012FF28:  0x00000001
         |              0x0012FF24:  0x00401034
         | EBP, ESP ->  0x0012FF20:  0x0012FF30

Now it’s time for simple_function to retrieve the arguments that were passed with the stack and create the local variables. Again, a “PUSH ECX” is executed in order to create a 4 bytes slot in the stack for the local_var variable: the MOV instruction at 0x00401004 loads the argument of the function (stored in EBP+8 = 0x0012FF28) into EAX and the second MOV instruction at 0x00401007 loads EAX into EBP-4 (i.e. the local_var variable).

An optimization, I’d say, would have removed the third MOV instruction at 0x0040100A, as it loads back the value from local_var into EAX for returning purposes. Anyway, then the simple_function function performs the epilogue and returns. Another new thing is the ADD instruction (this one will be examined in the next article) at 0x00401034, that cleans up the stack by adding 4 to ESP, voiding the parameters passed to the function. Then, it goes on performing an epilogue and it ends.

I hope I explained everything quite clearly how the stack works in x86 Assembly. In the next article, I’ll report some new common x86 ASM instructions (such as the easy-to-understand-for-now aforementioned ADD).

See you later.

Advertisements

7 thoughts on “x86 Assembly 101: when the basis crumble and you find yourself in the middle of nowhere…

  1. please could you tell me the name of the book this text is from I know i saw it but cant remember which, please it is driving me nuts

  2. long time ago I find online book about reverse engineering and assembly and it was very similar to your tutorial, it is great but I cant find it again 😦
    It talks about most used assembly instructions and mentioned same statistical link but talked about his own list of top 20 instructions, than about Ida pro software with example about tracking function call and changing app behavior

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s