0xNinjaCyclone Blog

Penetration tester and Red teamer


[Exploit development] 03- Heap Internals and Allocator Design (Windows & Linux)

Intro

Hello everyone, hope you’re all well. In this article, we’re going to talk about heap memory. We’ll look at what it is, how it works, and why software uses it. I’ll explain the heap’s special functions (APIs) and what happens in the background when software uses these functions.

Before diving deeper, it’s important to understand where the heap sits inside a running process. A typical process memory layout includes:

  • Text segment (code)
  • Data segment (initialized globals)
  • BSS (uninitialized globals)
  • Heap
  • Memory-mapped region (shared libraries, mmap)
  • Stack

The heap usually grows upward in virtual memory, while the stack grows downward. This physical positioning plays an important role in how memory behaves during runtime. Let’s get started and learn more about heap memory!

What is Heap?

The heap is a contiguous large region of memory that is subdivided into chunks to be dynamically allocated and deallocated at run time as needed. Each chunk in the heap contains not only the data for which memory has been allocated, but also includes additional metadata that guides the operating system in handling it.

The heap exists because not all memory requirements can be known at compile time. For example:

  • A linked list where nodes are created dynamically
  • A buffer sized based on user input
  • A server handling unpredictable request sizes

Stack memory is limited and follows strict LIFO rules. The heap removes these constraints by allowing memory to be requested and released in arbitrary order during execution.

Also note: the heap is part of the program’s virtual address space. The operating system manages memory in pages (typically 4KB), while the allocator manages smaller chunks inside those pages.

Heap’s APIs

When dealing with heap memory, developers use specific functions provided by libraries like libc and MSVC. These functions, or APIs, are tailored to interact seamlessly with the underlying operating system, ensuring optimal memory management. Each operating system has its own implementation details, hence the need for different APIs.

It is important to distinguish between:

  • The Heap → the memory region
  • The Allocator → the algorithm that manages that region

Functions like malloc and HeapAlloc do not directly manipulate raw OS memory every time. Instead, they interact with an allocator that manages chunks inside the heap efficiently, requesting more memory from the OS only when needed.

malloc

This function allows the developers to ask the OS for some memory.

int nSize = 1024;
void *ptr = malloc( nSize );

This function orders the OS to find an unused memory, and when the OS finds it will return an initialized memory address, if the OS fails, NULL will be returned and this is called Out Of Memory case. In reality, malloc will call more low-level APIs to do this task, such as HeapAlloc, LocalAlloc on Windows or sbrk, mmap on Linux.

In practice, malloc does not necessarily call the OS every time. It often:

  1. Searches its internal free lists (bins or trees)
  2. Reuses an already freed chunk if possible
  3. Requests more memory from the OS only if needed

On Linux, small allocations usually come from the brk-based heap, while large allocations are often served via mmap. This separation helps reduce fragmentation and improve performance.

calloc

This function is exactly like malloc but it zeroes the allocated memory and allows developers to specify the number of items to be allocated as an array.

int nSize = 1024;
int nNumberOfItems = 1;
void *ptr = calloc( nNumberOfItems, nSize );

realloc

This function allows developers to re-allocate memory under their control if the current memory they have is insufficient.

int nNewSize = 4096;
void *ptr = realloc( ptr, nNewSize );

The OS will expand that chunk if there is unused space next to it, and if not, the OS will look for other unused memory sufficient for the new size, and the OS will automatically move the data from the old memory to the new, Then releases the old to be available for another request. Or NULL will be returned if there is no more space. And that’s why we are reassigning ptr, because the function may return a different address in the case we discussed. This function calls more low-level APIs such as HeapReAlloc, LocalReAlloc on Windows, or mremap on Linux.

free

This function releases back the memory that has been allocated via any API that we have explained above.

free( ptr );

Any memory allocation must have a corresponding deallocation. This function calls either HeapFree, LocalFree on Windows, or munmap on Linux. Deallocating the chunk does not mean clearing its payload, Only telling the OS to mark the chunk as unused, to be available for other requests. The payload does not necessary to be deleted, because the next owner will overwrite it.

Freeing memory does not always return it to the operating system. In most cases:

  • The chunk is marked as free
  • It is inserted into a tracking structure
  • It remains available for reuse by the same process

Only under certain conditions (for example, when the top chunk becomes large enough) may memory be returned to the OS.

This behavior explains why applications can grow in memory usage but rarely shrink dramatically.

other methods

C++ provides developers with keywords such as new, delete, to dynamically allocate memory, and they follow the same approach we have explained.

Allocation process

In the allocation process, speed is key. Windows and Linux approach this with different data structures: Windows uses Trees data structures, which are efficient in sorting and quickly locating free memory chunks.Linux, on the other hand, opts for singly/doubly linked lists, offering simplicity and flexibility. These choices reflect different prioritizations in memory management strategies between the two operating systems.

All allocators must balance:

  • Speed of allocation
  • Speed of deallocation
  • Memory overhead
  • Fragmentation control

There are several allocation strategies such as:

  • First-fit
  • Best-fit
  • Segregated lists
  • Size-based bins

Each strategy represents a trade-off between performance and memory efficiency.

Deallocation process

In general, freeing memory does not actually return it to the operating system for other applications to use. Deallocating a chunk means marking this chunk of memory as ‘free to be reused’ by the application, but it still belongs to the application, OS will never give it to other applications. However, if the top chunk in a heap becomes large enough, some of that memory may be returned to the operating system. Also, freed chunks will be tracked by the OS, to help the OS respond quickly for the new allocation requests.

When a chunk is freed, adjacent free chunks may be merged together in a process called coalescing. This helps reduce external fragmentation.

Fragmentation comes in two forms:

  • Internal fragmentation → wasted space inside allocated chunks due to alignment
  • External fragmentation → free memory exists but is split into unusable pieces

Modern allocators implement coalescing and binning techniques to mitigate these problems.

What is a Chunk?

A chunk is a piece of the heap, it contains not only the payload but also other metadata that helps the OS to easily manage and maintain the memory, and its structure is differs when it is in use and when it is not.

Chunks are aligned in memory, typically to 8 or 16 bytes depending on architecture. This alignment:

  • Ensures correct CPU memory access
  • Improves performance
  • Avoids misaligned memory faults

If you request 13 bytes, the allocator may actually reserve 16 or more bytes to satisfy alignment requirements.

In-use Chunk (Windows)

  • (Sizes) - Encoded substructure that encapsulates important size and state information
    • (MemoryCost) - Used in free VS blocks. Indicates the number of pages occupied by the current chunk apart from the page where the chunk’s vs header lies.
    • (UnsafeSize) - The size of the chunk divided by 16, This value will be used during the FreeChunkTree allocation to compare different nodes and find a suitable chunk according to the requested size.
    • (UnsafePrevSize) - The size of the previous chunk divided by 16.
    • (Allocated) - Indicates whether the chunk is allocated/used or not.
    • (KeyUShort) - Used in free chunks.
    • (KeyULong) - Used in free chunks.
  • (EncodedSegmentPageOffset) - Encoded offset of the chunk from the start of the VS subsegment. in pages.
  • (UnusedBytes) - Indicates whether the chunk has unused bytes.

In-use Chunk (Linux)

  • (A) Allocated Arena - the main arena uses the application’s heap. Other arenas use mmap’d heaps. If this bit is 0, the chunk comes from the main arena and the main heap. If this bit is 1, the chunk comes from mmap’d memory and the location of the heap can be computed from the chunk’s address.
  • (M) MMap’d chunk - indicates that this chunk was allocated with a single call to mmap and is not part of a heap at all.
  • (P) Previous chunk is in use? - indicates whether the previous chunk is still being used by the application, or not, if set thus the prev_size field is invalid.

Free Chunk (Windows)

As you see the first part is the same, but the last part looks a bit different, Node.Left and Node.Right and Node.Parent have been added which are pointers that will be used to access the appropriate chunk in FreeChunkTree. depending on its size field.

  • (Node.Left) - A pointer to the left chunk.
  • (Node.Right) - A pointer to the right chunk.
  • (Node.Parent) - A pointer to the root chunk in their subtree.

Free Chunk (Linux)

As you see the first part is the same, but the last part looks a bit different, fwd and bck have been added which are pointers that will be used to access the appropriate bin. As well as fd_nextsize and bk_nextsize will be added for large chunks only which are pointers that may be NULL, and may point to its own chunk forming a loop of one, in addition to larger loops. Also the size of the chunk will be set at the end.

What is FreeChunkTree? (Windows)

Windows OS tracks the freed chunks using Trees data structures as we said before. FreeChunkTree is the main structure that is used to store free chunks during deallocation so that the allocator can quickly find the suitable chunk.

What is Lookaside? (Windows)

Lookaside are singly-linked lists of free chunks in each size. It’s used to track small chunks. The idea behind the lookaside list is to enable speed and rapid lookup time. It allows no more than 3 free chunks per list entry. If a new chunk gets deallocated, and there are already 3 entries for that particular chunk size, then it is freed to the FreeChunkTree.

What are Bins? (Linux)

The allocation process should be done very fast, As we know the chunks are either in use by the application or they’re not, the chunks in use are not tracked, but the free chunks are tracked via storing them in various lists based on size and other information so that the allocators such as malloc can quickly find the suitable chunks. Those tracking lists are called bins. There are various kinds of bins.

Bins are essentially categorized storage areas for freed chunks. Instead of scanning the entire heap for a suitable block, the allocator directly checks the bin that matches the requested size class. This dramatically improves allocation speed and scalability.

Fast bins

The fast bins contain small chunks which are stored in size-specific bins. The chunks in the fastbins may be moved to other bins as needed, and they are stored in an array of singly-linked lists, since they’re all the same size and chunks in the middle of the list need never be accessed.

Unsorted bin

There is only one of these. When small and large chunks are deallocated, they end up in this bin to make allocations and deallocations faster. They’re sorted later, in malloc, in order to give them one chance to be quickly re-used.

Small bins

The small bins are like fast bins, each bin contains chunks of the same size. But the small bins are doubly-linked and when allocation or deallocation is required, the first chunk pushed onto the list is the first to be popped off, as well as can be popped off from the middle. Additionally, the fwd and bck pointers as we saw before are to point to the next and previous chunks in the bin.

Large bins

The large bins are unlike the previous bins that we have talked about, each large bin can contain more than one size. The chunks are ordered in decreasing order of size, Which means inserting or deleting a chunk can occur at any point in the list.

Conclusion

In conclusion, understanding heap memory is vital for effective exploit development and software design. As we’ve seen, its dynamic nature offers both challenges and opportunities. To fully master heap memory, you should also understand:

  • Virtual memory and paging
  • Process memory layout
  • Data structures (linked lists, trees)
  • Memory alignment and fragmentation theory

Heap memory is not just dynamic storage — it is a carefully engineered system balancing performance, flexibility, and efficiency. Thank you for joining me on this exploration.