A fundamental primer on exploit development on both Windows and Linux based OS’s.
The classical classes of vulnerablilities:
- buffer overflow
- stack overflow
- heap overflow
- use after free
- out of bounds read
Integer Overflow and NetBSD
Considered concrete example in the NetBSD kernel, based on an incorrect coding style that is exposed to integer overflow during input validation.
static int
set_cursor(struct tfb_softc *sc, struct wsdisplay_cursor *p)
{
#define cc (&sc->sc_cursor)
u_int v, index = 0, count = 0, icount = 0;
uint8_t r[2], g[2], b[2], image[512], mask[512];
int error, s;
v = p->which;
if (v & WSDISPLAY_CURSOR_DOCMAP) {
index = p->cmap.index;
count = p->cmap.count;
if (index >= 2 || (index + count) > 2)
+++ integer overflow
return (EINVAL);
error = copyin(p->cmap.red, &r[index], count);
if (error)
return error;
error = copyin(p->cmap.green, &g[index], count);
if (error)
return error;
error = copyin(p->cmap.blue, &b[index], count);
if (error)
return error;
Note the overflow, about 1/2 way down. Just imagine if index
was a really large value that overflowed 32 bits. A more robust way to code the validation check, can be seen in the OpenBSD code:
if (index >= 256 || count > 256 - index)
return (EINVAL);
While a bit strange at first glance, is rather elegant. If index
is big, it bails. If index
is big it also bails.
Static Analysis with Clang
Is the process of doing analysis of the program using techniques like data flow analysis, point-to analysis, common mis-uses of API’s, taint analysis and more. Clang, the LLVM compiler frontend for C/C++, has an incredible static analyzer, and can be invoked when kicking off make
, like so:
scan-build make
TODO: run sample, and review HTML report.
Fuzzing
Fuzz testing (fuzzing) is a testing technique used to discover bugs in all types of software. It involves inputting massive amounts of random data, called fuzz, to the test subject in an attempt to make it crash.
Common types:
- Dumb fuzz just randomly mutate input data, and measure crashes.
- Generative fuzzing domain specific knowledge of target protocol or format (e.g. SSH handshake or PDF headers), fuzz specific fields (negative, max, all 9’s, …) and combinations of them together in the protocol, measuring crashes along the way.
- Grammar based fuzzing involves building up a context-free grammar (CFG) using the Sequitur (or Nevill-Manning algorithm), based on test cases of a programs input. This could be an ELF binary representation made by
readelf
for example. - Smart fuzzing uses symbolic execution, to guide the control flow through a program intelligently so that all possible branches and permutations are exercised. This gives the fuzzer the ability to measure code coverage at runtime, and to focus in on certain branches that are less exposed.
- Instrumented fuzzing places probes into the target binary through custom compilation. Using probe measurements allow for guided branch execution. Very effective in practice. The american fuzzy lop or
afl
takes this approach, and in practice can almost guarentee crash a program that has never been fuzzed.
To take afl
for a spin, first install, on my archlinux box a quick pacman -Sy afl-utils afl
did the job. The afl-gcc
GCC wrapper instruments the produced binary and can optionally add a number of protection mechanisms such as stack cookies and source fortification. The actually fuzz workload is done with afl-fuzz
, which given a seed file, brute fuzzes a variety of inputs. Take the following program:
#include <stdio.h>
#include <string.h>
int
main(int argc, char *argv[])
{
char q[20];
char s[1000];
fgets(s, sizeof(s), stdin);
printf("%s\n", s);
if (s[0] == 'c') {
strcpy(q, &s[1]);
printf("%s\n", q);
}
}
Hmm there’s a buffer overflow vulnerability, only if a certain branch of logic is exercised. Let’s see if afl
can find it. First compile:
$ afl-gcc buggy.c -o buggy -D_FORTIFY_SOURCE=2 -fstack-protector-strong
afl-cc 2.52b by <lcamtuf@google.com>
[+] Instrumented 5 locations (64-bit, non-hardened mode, ratio 100%).
Now we have an instrumented binary, its fuzz time. afl
works best with a seed file/s, it will use this as a basis to get up and running, and build upon it:
$ mkdir in && echo "123" > in/fuzztest_input
$ afl-fuzz -i in -o out ./buggy
afl
will throw thousands of variations of input at the target binary:
american fuzzy lop 2.52b (buggy)
┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│ run time : 0 days, 0 hrs, 5 min, 40 sec │ cycles done : 1580 │
│ last new path : 0 days, 0 hrs, 5 min, 39 sec │ total paths : 2 │
│ last uniq crash : 0 days, 0 hrs, 5 min, 39 sec │ uniq crashes : 1 │
│ last uniq hang : none seen yet │ uniq hangs : 0 │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│ now processing : 0 (0.00%) │ map density : 0.00% / 0.01% │
│ paths timed out : 0 (0.00%) │ count coverage : 1.00 bits/tuple │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│ now trying : havoc │ favored paths : 2 (100.00%) │
│ stage execs : 139/256 (54.30%) │ new edges on : 2 (100.00%) │
│ total execs : 2.33M │ total crashes : 4213 (1 unique) │
│ exec speed : 6965/sec │ total tmouts : 0 (0 unique) │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│ bit flips : 0/64, 0/62, 0/58 │ levels : 2 │
│ byte flips : 0/8, 0/6, 0/2 │ pending : 0 │
│ arithmetics : 0/448, 0/138, 0/35 │ pend fav : 0 │
│ known ints : 0/33, 0/134, 0/71 │ own finds : 1 │
│ dictionary : 0/0, 0/0, 0/0 │ imported : n/a │
│ havoc : 2/811k, 0/1.51M │ stability : 100.00% │
│ trim : n/a, 0.00% ├────────────────────────┘
└─────────────────────────────────────────────────────┘ [cpu000: 52%]
In the output dir:
.
├── crashes
│ ├── id:000000,sig:06,src:000001,op:havoc,rep:16
│ └── README.txt
├── fuzz_bitmap
├── fuzzer_stats
├── hangs
├── plot_data
└── queue
├── id:000000,orig:fuzztest_input
└── id:000001,src:000000,op:havoc,rep:32,+cov
We can see at least one crash has been detected:
# cat crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
c��������p��������
��������p�
It starts with a c
character, which we can see from the source code, gets into the branch of execution where the buffer overflow vuln lives. Then the other random 20+ bytes worth of data blow out the 20-byte buffer. The actual bytes themselves dont matter, and the output is better viewed with a hex viewer like xxd
:
# xxd -b crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
00000000: 01100011 11110000 11110000 11110000 11100001 11110000 c.....
00000006: 11110000 00010000 11110000 11011101 01110000 11110000 ....p.
0000000c: 00000111 11110001 11110000 11110000 11110000 11110000 ......
00000012: 11110000 11110000 00001100 11110000 11110000 11110000 ......
00000018: 11110000 11110000 11110000 11110000 11011100 01110000 .....p
0000001e: 11110000 .
Running the target program with this input, should crash it:
# ./buggy < ./out/crashes/id\:000000\,sig\:06\,src\:000001\,op\:havoc\,rep\:16
c��������p��������
��������p�
*** buffer overflow detected ***: ./buggy terminated
Aborted
Nice!
x86
Given a CPU architecture of your liking, from which there are many (e.g. MIPS, ARM, PowerPC, x64, etc), the key thing to remember is that they are all bounded by the laws of Turing, and more specifically the Von Neumann architecture. These first principles (e.g. instructions are obtained from memory or in otherwords software, hierarchy of memories and registers, an ALU for arithmetic based on binary representations, an instruction pointer that defines the instruction to be executed and so on) underpin all modern computing that takes place. If you like to go deeper, you can’t go past the nand2tetris 12 week program, in which you design and build a simple but fully featured 16-bit microprocessor literally starting from NAND gates.
Given this power, it is essential we work at the chip level using assembly. Higher level abstractions and constructs simply muddy the waters, not giving the fine grained instruction level control required. In other words, grab the assembly manual for the architecture of chip you plan to work with.
Assembly
Lots of resources out there, but these one pagers keep it simple:
- NASM Intel x86 Assembly Language Cheat Sheet
- Intel Code Table is a consise one page on the transfer, arithmetic, logic, jumps and misc instructions provided by the 80186 instruction set and higher.
- GCC x86 Assembly Quick Reference is another, but shows off the AT&T style assembly sytnax, which
gdb
uses.
Reverse string in EBX
section .text
global _start
_start:
mov eax,ebx
jmp endcount
loop:
inc ebx
endcount:
cmp byte[ebx],0
jne loop
reverse:
cmp eax,ebx
jg finish
mov cl,[eax]
mov ch,[ebx]
mov [eax],ch
mov [ebx],cl
dec ebx
inc eax
jmp reverse
finish:
Debugging with GDB
The GNU debugger, the gold standard when it comes to debugging on Linux. Make sure the peda extension is installed, which add some insanely useful feature to GDB, like better colorisation and visual (still CLI!) display of memory and instructions, and commands like context
, context_stack
and jmpall
to name but a few.
git clone https://github.com/longld/peda.git ~/peda
echo "source ~/peda/peda.py" >> ~/.gdbinit
Don’t forget help peda
has always got your back.
GDB Cheatsheet
GDB is great, and supercharged with the peda module.
break foo
set breakpoint of functionfoo
break *0xf7cd6ef8
set breakpoint on specific instructionbreak main:66
set break on instruction that relates to source liner
run program (once all breaks in place)c
continuestepi
step next instructionnexti
info reg
dump out all registersprint "%d\n", 0xf7cd6edc
printf style output of memory or registersx/i foo
to examine each instruction offoo
functionx/s $esp
printesp
as stringx/xw $esp
printesp
as 32-bit (word) in hex
Exploitation Concepts
Stack mechanics
x86 for example.
NOP sled
ROP chain
Glossary
Term | Meaning |
---|---|
Buffer | A contiguous block of memory |
DEP | Data Execution Prevention, enforces a non-executable stack, putting a stop to most primitive buffer overflow attacks |
NOP | no op |
ROP | Return Oriented Programming, a clever technique for chaining existing assembly instructions together |
shellcode | code thats only goal in life is to facilitate providing an interactive shell |
TFP0 | Task for PID0, an exploit that focusing on comprimising the init system |