Note
You can view my writeups on buffer overflow:
- Local link w/i vault: B&E II Classic Overflow
- TrshPuppy Portfolio site
- GitHub
In general, buffer overflows target stack frames in memory allocated to data structures being used by a program. The goal is to overwrite and overflow the stack frame in order to access memory not intended to be accessed by that frame.
Buffer overflows are possible when a program is able to write more data to the buffer space of a data structure/ subroutine than was intended. Writing past the bounds of the buffer causes corruption of adjacent data on the call stack. This will often cause the program to crash.
Additionally, stack memory contains all the return addresses for active function calls, so overflowing it is more effective at derailing an application than overflowing heap memory.
Since the stack stores all the return addresses for functions on the call stack, the goal of most intentional buffer-overflows is to overwrite the return address of a function w/ a pointer to malicious code instead (which is also usually on the stack).
#include <string.h>
void foo(char *bar)
{
char c[12];
strcpy(c, bar); // no bounds checking
}
int main(int argc, char **argv)
{
foo(argv[1]);
return 0;
}
This C code takes a command line argument and copies it to a variable called c
without checking how much space it will take up. On execution 12 bytes of space is allocated on the stack for the c
variable. However in the C programming language strings are terminated with a null byte so really, if a string longer than 11 bytes is created, the 12 byte buffer will be overflowed:
On the call stack, just under the frame allocated for c
is the next frame, allocated for *bar
and under that frame is the return address for the function foo()
.
An attacker can overwrite data all the way down to the return address and then replace the data there with a pointer to their own code. Usually this pointer is the address of the original variable (in this case c
). So when foo()
returns, the CPU pops the return address off the stack and goes to that address to continue executing code.
In this picture, The string of AAAAA
's represent the attacker-supplied data used for the overflow of c
. Once an attacker knows the amount of bytes it takes to reach and overwrite the return address, they can replace part of their overflow string with shellcode. The CPU will return to the attacker-supplied address and begin executing the shellcode that's been inserted there.
Note
This is derived from my work through TsarSec's course on Binary Exploitation. Go check it out!
In order to understand the buffer overflow, we need to have code which will compile into a binary which is vulnerable to buffer overflow:
#include <stdio.h>
#include <unistd.h>
#include <string.h>
void overflow() {
char option[0x2];
char name[0x100];
int MAGIC = 0xe4ff;
while(1) {
memset(name, 0x00, 0x100);
puts("Hey, whats your name!?\n");
read(STDIN_FILENO, name, 4096);
puts("Welcome ");
puts(name);
puts("is this name correct? (y/n)?");
read(STDIN_FILENO, option, 2);
if(option[0] == 'y' && option[1] == '\n') {
break;
}
}
}
int main() {
overflow();
return 0;
}
In this code, the user inputs their name which is saved in the name
buffer. It then asks the user if the inputed name is correct, then returns.
We can compile the code using gcc
into a binary. Then, (in linux), we can use gdb to see/ work w/ the binary and step through execution. (P.S. there are additional steps which are necessary that I've purposefully left out. Go check out TsarSec's course, it's free at the time of writing this :) ).
To attach the binary to gdb, all we have to do is use the command gdb ./vuln
(or whatever you named the binary). This brings us to the opening prompt. There are a lot of options here, but we can simply start by setting a breakpoint at main()
and then using run
:
┌──(hakcypuppy㉿kali)-[~/tsar]
└─$ gdb ./vuln_1 # <-------------
GNU gdb (Debian 13.2-1) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation>
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./vuln_1...
(No debugging symbols found in ./vuln_1)
(gdb) break main # <-------------
Breakpoint 1 at 0x4011fd
(gdb) run # <-------------
Starting program: /home/hakcypuppy/tsar/vuln_1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x00000000004011fd in main ()
(gdb)
We can step through the execution by using c
in the prompt (for continue). gdb
will print the output from the binary as you go.
(gdb) c
Continuing.
Hey, whats your name!?
tiddies
Welcome
tiddies
is this name correct? (y/n)?
To see all the areas of memory our binary will be using we can use the info proc mappings
command. This will show us where the program is stored in memory, where the heap and stack are mapped to, where all the attached libraries are, etc:
(gdb) info proc mappings
process 244655
Mapped address spaces:
Start Addr End Addr Size Offset Perms objfile
0x400000 0x401000 0x1000 0x0 r--p /home/hakcypuppy/tsar/vuln_1
0x7ffff7dca000 0x7ffff7df0000 0x26000 0x0 r--p /usr/lib/x86_64-linux-gnu/libc.so.6
...
0x7ffff7f9f000 0x7ffff7fac000 0xd000 0x0 rw-p
0x7ffff7fc3000 0x7ffff7fc5000 0x2000 0x0 rw-p
0x7ffff7fc5000 0x7ffff7fc9000 0x4000 0x0 r--p [vvar]
0x7ffff7fc9000 0x7ffff7fcb000 0x2000 0x0 r-xp [vdso]
...
0x7ffffffde000 0x7ffffffff000 0x21000 0x0 rwxp [stack]
--Type <RET> for more, q to quit, c to continue without paging--
This is an edited list w/ some mappings removed but you can see that the program maps itself to 0x400000
, the stack is mapped to 0x7ffffffde000
(this is where it starts, and the libc.so.6
library is at 0x7ffff7dca000
.
Since we know the name
variable in the overflow()
function is vulnerable, that's gonna be our target. Every function on the stack has a saved instruction pointer (saved rip
) which tells the CPU where to go after the function has returned.
If we look at the assembly output for overflow()
we can get an idea of how the stack will look. To see this, we just use disassemble overflow
in gdb
:
Looking at this we can see a few places where the call
instruction is used in reference to functions in our c code. call
does exactly what it sounds like: it calls the named function so it can be executed.
In the disassembled code for overflow()
, memset()
is the first subroutine being called in the overflow()
function. Before we dive into what happens when a function is called, let's remember some things about the stack itself:
- The stack spans a number of memory addresses in the RAM.
- The stack grows downward, i.e. frames are added starting at the highest memory address, towards lower addresses.
- The "top of the stack" refers to the lowest address in the stack.
- The 'bottom of the stack' refers to the highest address in the stack.
- The first frame added to the stack is the last frame which will be executed by the CPU (first in, last out or 'F.I.L.O.').
- When data is written to a frame on the stack (i.e. to a variable in the function belonging to that frame) the data is written towards higher memory addresses (the opposite direction that frames are added)
Knowing these guidelines, let's work through the stack manipulation which happens when main()
is called.
This is the assembly code for
main()
. We can see that the first call instruction is for overflow()
, but some things are happening before that which relate to the stack.
This is a common sequence seen before function calls in assembly. In this sequence, the frame pointer stored in the %rbp
register is pushed to the stack, then the stack pointer (saved in %rsp
register) is copied into the %rbp
register.
This sequence is important for flow control because when an activation frame (a frame which represents a function in the code) is pushed onto the stack or popped off, there is no way to control for the size of the frame. In other words functions in code have varying frame sizes.
The CPU needs to know where to resume execution after a frame finishes, and since a frame can be of varying length, the address where the CPU needs to resume is not a fixed length from it's last position.
The stack pointer should always be pointed at the top of the stack (or the address right after the end address of the last frame, in this case '5').
Before calling overflow()
(and pushing its frame onto the stack), we need to push the current frame pointer (the address in the %rbp
register), onto the stack. When that happens the stack will look something like this:
Now that we've added a frame, we need to change the frame pointer to where the stack pointer is currently pointing. So we copy what's in the
%rsp
register into the %rbp
register:
Now the frame pointer and the stack pointer are pointing at the same place in memory. That's okay for now. We're going to call
overflow()
, push it onto the stack, and overflow()
will take care of moving the stack pointer to its new position.
Before we update our stack picture, we have to understand what the call
function does. In general, it calls the function being referenced, but there are 2 important parts which make that up:
call
pushes the return address onto the stack (the address immediately following its own address)- It changes
rip
(the instruction pointer which is pointing at the assembly instructions) to the address of the function being called. This ensures that the next instruction to be read by the CPU is the first instruction in the function being called (which isoverflow()
).
So let's update our drawing so it more closely matches what's happening in our overflow code:
Compared to our last picture, I've updated the addresses (they will be slightly different on different computers). We now have an address saved in the
%rip
register (updated by the call
instruction). And our stack and frame pointers are still pointing at the same address on the stack.
TIP: In order to get these values I put a breakpoint at overflow
in gdb, then used the info registers
command (i r
works too) to see what was stored in all the registers:
We've finally made it to the overflow code! Now, can we finally update the stack pointer so its pointing where it should be? In fact, the frame pointer should probably be updated too.
Let's look at the first few instructions of overflow()
:
(gdb) disassemble overflow
Dump of assembler code for function overflow:
0x0000000000401146 <+0>: push %rbp
0x0000000000401147 <+1>: mov %rsp,%rbp
=> 0x000000000040114a <+4>: sub $0x110,%rsp
...
The first two instructions look super familiar. That's because the same sequence that we saw earlier w/ the frame and stack pointers is happening again... don't ask me why.
Next is the sub $0x110,%rsp
. The sub
command stands for subtract
, so this command is subtracting 0x110
from the stack pointer. Since this is in hex, 0x110
is 272 (256 + 16), and we're subtracting 272 from the value currently held in the %rsp
register.
Since the stack grows downward, subtracting from the stack pointer will move it down towards the top of the stack. So the new address should be 7FFF FFFF DCC0
for stack pointer. We can now update our picture:
After this, a few more instructions are execute. For now, suffice it to say that the variables local to
overflow()
are created, allocated space, etc.. They all exist inside the frame for overflow()
on the stack and we can expand the overflow
frame to show that:
And our stack now looks like this:
Now that we have an idea of how the stack is manipulated during a program's runtime, we want to hijack the control flow in order to exploit the program.
With all of the mitigations turned off (flags we can give to gcc
during compilation) the stack is readable, writable, and executable (rwx
). We can check this by using the info proc mappings
command in gdb
.
(gdb) info proc mappings
... # ----+
0x7ffffffde000 0x7ffffffff000 0x21000 0x0 rwxp [stack]
This means that data we write to the stack can be executed by the CPU. So how do we know where to put our exploit so the CPU will execute it?
Our target will the saved rip
frame on the stack. We know
[!Resources]
[!Related notes]
- PNPT: buffer-overflow-basics, spiking, fuzzing, memory
- Pattern Create
- My writeups: stack manipulation, basic buffer overflow, NOP sleds