CVE-2018-8611 Exploiting Windows KTM Part 5/5 – Vulnerability detection and a better read/write primitive

25 May 2020

TL;DR
Safe vulnerability test
BlueHat 2019 Shanghai presentation review
Follow up analysis
Conclusions
- Future of KTM
- Shoutouts

TL;DR

As we explained in previous parts of our blog series, we originally exploited CVE-2018-8611 without having access to any deep details about how the exploit worked since we only used the Kaspersky blog published in December 2018. However, after we built our exploit and wrote our blog series, we (admittedly very late) became aware of the BlueHat 2019 Shanghai presentation delivered by Kaspersky in May 2019. This talk detailed more about how the in-the-wild exploit actually worked.

In the end, we found this presentation extremely interesting, because it shows where our methodology deviated from the original exploit founders. We are also happy we found it much later, because it led us to come up with tricks that are still useful in spite of other situationally more powerful approaches used by the original 0day exploit.

For someone new to the joys of exploit development, seeing a comparitive analysis of approaches hopefully elucidates how this type of development process progresses. It is often based on intuitions during the exploit development process, rather than having a standard approach to all vulnerability exploitation problems.

We now endeavor to review and analyze Kaspersky’s explanations in light of our newfound understanding, and provide insights and corrections where applicable. We will also analyze the PreviousMode overwrite approach used by the 0day exploit. We will see why it is a very powerful trick on 64-bit and why it does not work on 32-bit hosts, and therefore why the read/write primitives we discovered and described in part 4 of this series come to the rescue.

Safe vulnerability test

Before diving into Kaspersky’s details, we will detail how to safely detect if the vulnerability is present with zero risk of crashing the machine. Due to the way Microsoft chose to patch this vulnerability it is possible to do so. We do this by triggering code in the TmRecoverResourceManager() loop but without triggering any race condition. Instead of trying to free an enlistment, we simply put the transaction manager into the offline state.

We set the transaction manager offline by closing the only handle we have opened to it from userland using CloseHandle(), which triggers the TmpCloseTransactionManager() function in the kernel:

void __fastcall TmpCloseTransactionManager(__int64 a1, _KTM *pKTM, __int64 a3, __int64 a4)
{
  void *hRMHandle; // rsi MAPDST
  struct _KRESOURCEMANAGER *pRM; // rax MAPDST

  if ( a4 == 1 )
  {
    hRMHandle = 0i64;
    pRM = 0i64;
    TmpTmOffline(pKTM);
    [...]
}

What is important for us is that TmpTmOffline() sets the _KTM.State to KKtmOffline:

__int64 __fastcall TmpTmOffline(_KTM *pKTM)
{
  ...
    pKTM->State = KKtmOffline;

Let’s recall that the patch (among other changes) added the following code in the TmRecoverResourceManager() loop:

          Tm_ = pResMgr->Tm;
          if ( !Tm_ || Tm_->State != KKtmOnline )
          {
            ret = STATUS_TRANSACTIONMANAGER_NOT_ONLINE;
            goto b_release_mutex;
          }
          pEnlistment_shifted = EnlistmentHead_addr->Flink;

After sending a notification, if TmRecoverResourceManager() detects the transaction manager has gone offline it will exit early with the STATUS_TRANSACTIONMANAGER_NOT_ONLINE error. In a vulnerable system the function will continue and RecoverResourceManager() will eventually exit without error.

This is significantly easier to trigger than the actual race condition vulnerability, as we only need to suspend the recovery thread at any point in the TmRecoverResourceManager() loop when it is handling one of what can be many thousands of enlistments. We do not care which enlistment it is touching. We also do not care where the recovery thread is suspended in the loop as we are not interested in winning any race condition. We only care whether or not the particular STATUS_TRANSACTIONMANAGER_NOT_ONLINE error code is returned from calling NtRecoverResourceManager():

enum MACRO_STATUS
{
  ...
  STATUS_TRANSACTIONMANAGER_NOT_ONLINE = 0xC0190052,

From a defense perspective this type of detection is useful for testing. Conversely, from an attack perspective, you could abuse this detection mechanism prior to downloading a full blown exploit you do not want being detected or to confirm you can just exploit a known vulnerability versus burning another 0day.

BlueHat 2019 Shanghai presentation review

Approach comparison

The BlueHat presentation was written by Boris Larin and Anton Ivanov from Kaspersky. They discuss CVE-2018-8611 as the third case study of in-the-wild 0day exploits they discovered. Their analysis is covered in the PDF in pages 50-73.

Page 50 notes the in-the-wild exploit worked on 7 through Windows 10 1803. They don’t mention if this is on 64-bit and 32-bit, however the examples are 64-bit and based on our subsequent analysis we suspect that this version of the exploit perhaps does not support 32-bit at all.

That said, through our own research we have independently confirmed that all vulnerable versions of Windows from Vista through Windows 10 1809 on x86 and x64 are exploitable for the purposes of elevating privileges.

If you have read the previous parts of this blog series, then the missing KTM and triggering details from slide 50 to 56 should be fairly clear. Note that, as discussed in part 3, it is likely the in-the-wild exploit approach to detecting the race condition was different from ours. We unfortunately are not able to tell for sure from the slides.

On page 56-57, Kaspersky discuss the race condition vulnerability. The description of what the actual vulnerability is in our opinion, slightly incorrect. Detection of the race condition could not be done by a successful execution of NtRecoverResourceManager() since we need the recovery thread to be stuck in the TmRecoverResourceManager() loop to trigger the write primitives. This is clearly used by the in-the-wild exploit based on slides 59+. Successful execution of NtRecoverResourceManager() simply indicates that exploitation of the bug has completed. The vulnerability also does not technically have anything to do with a resource manager going offline, but rather with an enlistment becoming finalized. As we demonstrated in our earlier posts, it is possible to finalize an enlistment by committing it after it is in the necessary state.

On page 58, they note the patch’s changes. The second bullet point states that a check was added to see if the resource manager is online. In our patch analysis in part 2, we showed that this check corresponds to the transaction manager, and not the resource manager. We also believe that it is not directly indicative of the vulnerability itself, as the main culprit is the KENLISTMENT_FINALIZED flag changes. The check for the transaction manager being offline is likely just an optimization change, which allows early exit from the recovery loop. (Of course it is also possible it fixes the vulnerability in a way we don’t understand). As discussed earlier in this post, we know that this new check for the transaction manager being offline allows us to benignly detect a vulnerable system prior to an exploitation attempt.

On page 60, Kaspersky note that there are a limited number of abusable functions inside the TmRecoverResourceManager() loop. This matches our own analysis. Now things get quite interesting, because the approach of the in-the-wild exploit deviates from our approach!

Next, they discuss how the 0day exploit apparently triggers a 0 overwrite primitive (instead of an increment primitive that we used) and targets the _KTHREAD PreviousMode field to build a powerful arbitrary kernel read/write primitive. This is particularly interesting so we will explain this a bit more in the next section.

The 0 value overwrite primitive

From slides 60 to 70, they explain that the 0day exploit crafts a fake userland dispatcher object of type EventNotificationObject that goes into a wait state when KeWaitForSingleObbject() is called. It allows them to modify the object before the following KeReleaseMutex() call. The modification results in some code at the end of KiTryUnwaitThread() being reached. This code triggers a 0 value overwrite (32-bit size on 32-bit and 64-bit size on 64-bit):

  _InterlockedAnd64( OwnerThread->ThreadLock, 0i64);
  ++WaitBlock->BlockState;
  return result;
}

On page 65, Kaspersky explain that introducing a fake userland EventNotificationObject causes the TmRecoverResourceManager() thread to become stuck in a wait state. While the thread is waiting, they modify the dispatcher object to become a SemaphoreObject.

The presentation does not mention how to detect from userland when the thread is blocked, but it appears the exploit uses a similar trick to our own check for the change in the KENLISTMENT_IS_NOTIFIABLE flags. It is also possible that they check from userland for the modification of a lock value in the dispatcher object, or another value in some other object.

On page 67, they note that GetThreadContext() is used to wake up the blocked thread, which likely explains their use of the EventNotificationObject prior to modifying the dispatcher header.

As detailed in part 3 of our series, it is interesting to note that due to the absence of SMAP, the 0day exploit also detects the race win and coaxes the vulnerable function to start parsing controlled structures in userland. This is quite similar to our own approach, even if we used a different write primitive.

On page 71, Kaspersky finish explaining how the 0 value overwrite is used to overwrite the PreviousMode in the leaked _KTHREAD structure. This trick has some subtleties and quirks.

Instead of purposefully hitting the logic for OwnerThread->State == Suspended in KiTryUnwaitThread() and entering the corresponding code like we did, they exit the function immediately. The trick here is that they can point the controlled _KMUTANT.OwnerThread pointer anywhere. They choose to do so in a way that allows them to overlap the OwnerThread.ThreadLock field with the PreviousMode field of the recovery thread associated _KTHREAD structure, with the ultimate goal of setting PreviousMode to 0.

The code they abuse is here:

  char __fastcall KiTryUnwaitThread(struct _KPRCB *CurrentPrcb, PKWAIT_BLOCK WaitBlock, PVOID WaitStatus, _QWORD *pOutputVar)
  {
    //...
  
    OwnerThread = WaitBlock->Thread;
    result = 0;
[1] if ( _interlockedbittestandset64( OwnerThread->ThreadLock, 0i64) )
    {
      i = 0;
      do
      {
        if ( ++i   HvlLongSpinCountMask || !(HvlEnlightenments   0x40) )
          _mm_pause();
        else
          HvlNotifyLongSpinWait(i);
      }
      while ( OwnerThread->ThreadLock || _interlockedbittestandset64( OwnerThread->ThreadLock, 0i64) );
    }
[2] if ( OwnerThread->State == Suspended )
    {
      //...
          _InterlockedAdd( ThreadQueue->CurrentCount, 1u);// our increment primitive
    }
[3] _InterlockedAnd64( OwnerThread->ThreadLock, 0i64);    // their zero value overwrite primitive
    ++WaitBlock->BlockState;
    return result;
  }

There are three important points that happen in this order in the code: the initial locking of the ThreadLock at [1], the State test at [2], and the unlocking of the ThreadLock at [3]. We will analyze these slightly out of order.

The Vista problem

The 0 value overwrite approach we are about to describe in more detail for both 32-bit and 64-bit doesn’t work on 64-bit Vista because of some code differences.

There are other 0 value overwrite primitives that appear to exist in the code. It is unclear to us if the author of the 0day exploit actually used such different 0 value overwrite primitives to support Vista 64-bit.

From our perspective, we didn’t investigate them because we already had our increment primitive. It was easier for us to simply wrap the PreviousMode value to 0 using our increment primitive, to enable the more powerful and convenient arbitrary read/write primitive based on PreviousMode being 0.

Potential state test problem

First, we discuss the State test at [2]. Since they are providing an OwnerThread pointer that overlaps with an existing _KTHREAD structure in the kernel, they run the risk of the _KTHREAD.State byte field holding a value of 5. This corresponds to the Suspended value:

.text:0000000140032E9E    movzx   eax, byte ptr [rbx+164h] ; _KTHREAD.State
.text:0000000140032EA5    cmp     al, 5    ; Suspended

This could result in entering the if condition at [2] that they don’t want to go into. In practice their OwnerThread pointer appears to overlap with a pointer in the target _KTHREAD, so the chances are reasonably low this would happen. We’ve successfully tested writing PreviousMode with this method on Windows 7 through Windows 10 1809 with no issues.

We know from the Kaspersky analysis that they are pointing the fake thread to a legitimate _KTHREAD base address and adding 0x1eb (more on why later). We see from the assembly above that the State field is being pulled from offset 0x164. This tells us that the State variable will be tested from a 0x1eb + 0x164 = 0x34f offset from the _KTHREAD base. On Windows 10 1809 x64 this falls into an array of _KLOCK_ENTRY structures.

    struct _KLOCK_ENTRY LockEntries[6];                                     //0x320
    struct _SINGLE_LIST_ENTRY PropagateBoostsEntry;                         //0x560

Since we know this array starts at 0x320, and the above State test offset is 0x34f, we check what is at offset 0x2f inside of the _KLOCK_ENTRY structure.

    union
    {
        struct _KLOCK_ENTRY_LOCK_STATE LockState;                           //0x20
        VOID* volatile LockUnsafe;                                          //0x20
        struct
        {
            volatile UCHAR CrossThreadReleasableAndBusyByte;                //0x20
            UCHAR Reserved[6];                                              //0x21
            volatile UCHAR InTreeByte;                                      //0x27
            union
            {
                VOID* SessionState;                                         //0x28
                struct
                {
                    ULONG SessionId;                                        //0x28
                    ULONG SessionPad;                                       //0x2c
                };
            };
        };
    };

This is a union with quite a few fields, so it’s contextually hard to say if any of the values could contain the unwanted value 5 at a glance. Of course what it overlaps can also differ across Windows versions, so there is still the chance this type of blind overlapping could cause problems on some systems. In practice we’ve never seen the State == 5 case actually get hit during exploitation.

Locking

Next, let’s discuss the actual locking at [1]. Wherever _KTHREAD.ThreadLock happens to lie in memory will be locked using _interlockedbittestandset64( OwnerThread->ThreadLock, 0i64). The underlying assembly for this particular locking macro is very important for this trick to work. The test on Windows 7 x64 looks like the following:

.text:0000000140032E88    lock bts qword ptr [rbx+40h], 0     ; _KTHREAD.ThreadLock

This bts instruction only sets the 0th placed bit (2nd operand) in the qword (first operand).

Had the instruction at [1] been a larger bit-field width comparison, it could have included the PreviousMode field that they are attempting to overwrite in the first place at [2]. If that were the case, the lock at [1] would never have been obtained in the first place, so the 0 overwrite would obviously never be reached at [2].

As Kaspersky allude to but does not actually explain, this lower bit at rbx+40h that must be 0 is why the exploit used the target _KTHREAD address, added to the weird offset 0x1eb as the address for OwnerThread. Let’s try to understand why.

Based on the offset they used being 0x1eb, we use Windows 10 1809 offsets as an example. In _KTHREAD, the PreviousMode field is at 0x232. Remember, 0x40 is being added to the OwnerThread pointer in order to lock OwnerThread. This gives us 0x1eb + 0x40 = 0x22b. 0x22b corresponds to one of the bytes of the UserAffinityFill array , which is immediately before the PreviousMode field they want to overwrite:

//0x5f0 bytes (sizeof)
struct _KTHREAD
{
    ...
    union
    {
        struct _GROUP_AFFINITY UserAffinity;                                //0x228
        struct
        {
            UCHAR UserAffinityFill[10];                                     //0x228
            CHAR PreviousMode;                                              //0x232
            CHAR BasePriority;                                              //0x233
            union
            {
                CHAR PriorityDecrement;                                     //0x234
                struct
                {
                    UCHAR ForegroundBoost:4;                                //0x234
                    UCHAR UnusualBoost:4;                                   //0x234
                };
            };
            UCHAR Preempted;                                                //0x235
            UCHAR AdjustReason;                                             //0x236
            CHAR AdjustIncrement;                                           //0x237
        };

This is presumably done because this particular offset in UserAffinityFill is always 0, and this allows them to obtain the lock at [1].

Unlocking

Now let’s discuss the unlocking instruction at [3] triggering the 0 value overwrite: _InterlockedAnd64( OwnerThread->ThreadLock, 0i64). The corresponding assembly for this function is interesting:

.text:0000000140032F18    lock and qword ptr [rbx+40h], 0  ; _KTHREAD.ThreadLock

Because it unlocks the lock, it does not operate on a single bit, but rather sets the entire 64-bit value (qword) to 0. In this case, the values from 0x22b through 0x22b+0x8=0x233 will be zeroed, including PreviousMode and BasePriority.

From this point, the chain of functions is exited without many additional constraints (compared to our increment primitive), and we assume the TmRecoverResourceManager() loop is broken out of the same way we broke out of it by abusing the leaked _KRESOURCEMANAGER address.

PreviousMode abuse

On page 72-73, Kaspersky explain that the 0day exploit abused PreviousMode being 0 to read and write arbitrary kernel memory, as most sanity checks in kernel space are predicated on PreviousMode being set to 1 for userland processes.

This is very interesting, because the PreviousMode persisting at 0 after an overwrite directly contradicts the official documentation from Microsoft which states that the trap handler sets it for every system call when it originates from userland. This is quoted in the Kaspersky presentation.

Follow up analysis

Now, we will look at why the general 0 value overwrite trick abused in KiTryUnwaitThread() does not actually work on all 32-bit versions. Then, we will return to the abuse of PreviousMode and show it only works easily on 64-bit, and how that contrasts to 32-bit where it actually works the way the Microsoft documentation suggests.

0 value overwrite primitive on 32-bit?

Our exploit uses the increment primitive to exploit 32-bit systems, so we knew we could exploit the vulnerability to elevate privileges on 32-bit.

However, we were curious if the same 0 value overwrite primitive used by the in-the-wild exploit would also work on 32-bit. We discovered that it does not, so let’s take a look at why. The code examples are from Windows 7 32-bit.

We already know we can enter the KiTryUnwaitThread() function, as this is what our exploit did, so let’s start there:

char KiTryUnwaitThread(_KWAIT_BLOCK *WaitBlock, struct _KPRCB *a2, int a3, _KTHREAD **a4)
{
  //...

  i = 0;
  OwnerThread = WaitBlock->Thread;
  v12 = 0;
  OwnerThreadLock = (volatile signed __int32 *) OwnerThread->ThreadLock;
  while ( _InterlockedExchange(OwnerThreadLock, 1) )
  {
    do
    {
      if ( ++i   HvlLongSpinCountMask || !(HvlEnlightenments   0x40) )
        _mm_pause();
      else
        HvlNotifyLongSpinWait(i);
    }
    while ( *WaitingThreadLock );
  }
  if ( OwnerThread->State == Suspended )
  {
    //...

The first thing that stands out is that on 64-bit the while loop was bounded by the macro _interlockedbittestandset64( OwnerThread->ThreadLock, 0i64) call, whereas here on 32-bit the call is _InterlockedExchange(OwnerThreadLock, 1). The corresponding assembly is:

.text:00478F4A    xor     eax, eax    ; eax = 0
.text:00478F4C    mov     ecx, ebx    ; _KTHREAD.ThreadLock
.text:00478F4E    inc     eax         ; eax = 1
.text:00478F4F    xchg    eax, [ecx]  ; *_KTHREAD.ThreadLock = 1, eax = old ThreadLock
.text:00478F51    test    eax, eax    ; old value == 0?
.text:00478F53    jnz     short b_wait_loop

Above, the value 1 is written to the _KTHREAD.ThreadLock and the old value is tested to see if it was non-zero. If the old value is non-zero, then the while loop is entered to wait on the lock being available. The fact that the test is using the entire 32-bit value means that we are unable to use the unaligned lock pointer trick to overwrite PreviousMode. This is because if there is a non-zero value in the lock that we wish to overwrite, we are prevented from locking it in the first place!

Had the macro been different, then the 0 value overwrite primitive could have been used the same way it was on 64-bit.

This then raises the question: is the macro always different on x86? It turns out after checking some x86 Windows versions that it seems Microsoft started using _interlockedbittestandset() on Windows 8 and above. We checked Windows 8 and Windows 10 1809 x86. So this likely means that at least on Windows 8 and later 32-bit systems, it is possible to abuse an analogous 0 value overwrite primitive.

Even on systems like Windows 7 x86 where we don’t have the 0 value overwrite primitive, since we know the increment primitive works, we still have the opportunity to modify PreviousMode by incrementing the value 255 times and wrapping it to 0 to achieve the same effect.

In theory, the ability to write 0 to PreviousMode on 32-bit should give us an analogously powerful kernel read/write primitive on 32-bit. In practice this does not appear to be the case. We will describe why as we delve into PreviousMode further.

PreviousMode – a "god mode" primitive?

PreviousMode on 64-bit

As discussed earlier, the in-the-wild exploit leveraged a powerful technique to achieve an arbitrary read/write primitive, which is to set the PreviousMode field of the _KTHREAD to 0, which corresponds to the KernelMode value from an enum like this:

typedef enum _MODE {
    KernelMode = 0,
    UserMode,
} MODE;

PreviousMode is used to indicate that a syscall was called by the kernel, which we will look at in more detail shortly. If this value is set to 0, then functions like NtReadVirtualMemory() or NtWriteVirtualMemory() can be abused to read or write to kernel memory, as address validation checks are skipped:

__int64 __fastcall NtWriteVirtualMemory(HANDLE ProcessHandle, PVOID BaseAddress, PVOID Buffer, __int64 BufferSize, __int64 *NumberOfBytesWritten)
{
  pCurrentThread = KeGetCurrentThread();
  PreviousMode = pCurrentThread->PreviousMode;
  
  // Check only if called from UserMode
  if ( PreviousMode )
  {
    EndAddress = BaseAddress + BufferSize;
    if ( BaseAddress + BufferSize  lt; BaseAddress )
      return STATUS_ACCESS_VIOLATION;
    BufferEnd = Buffer + BufferSize;
    if ( BufferEnd  MmHighestUserAddress || BufferEnd > MmHighestUserAddress )
      return STATUS_ACCESS_VIOLATION;
    if ( NumberOfBytesWritten )
    {
      NumberOfBytesWritten_ = NumberOfBytesWritten;
      if ( NumberOfBytesWritten >= MmUserProbeAddress )
        NumberOfBytesWritten_ = MmUserProbeAddress;
      *NumberOfBytesWritten_ = *NumberOfBytesWritten_;
    }
  }
  ...

The earliest reference we could find to the PreviousMode field being used as an exploit target was from Tarjei Mandt at Infiltrate 2011 in his Modern Kernel Pool Exploitation: Attacks and Techniques talk (see slide 124).

There have been other interesting ideas around abusing PreviousMode logic. In March 2019, James Forshaw discussed similar ideas about finding code paths that make incorrect assumptions about how calls were made.

To us, the fact that overwriting PreviousMode with 0 actually persisted across syscalls was unexpected, as the official Microsoft documentation states the exact opposite. The relevant excerpt is:

When a user-mode application calls the Nt or Zw version of a native system
services routine, the system call mechanism traps the calling thread to kernel
mode. To indicate that the parameter values originated in user mode, the trap
handler for the system call sets the PreviousMode field in the thread object of
the caller to UserMode.

If "the trap handler for the syscall sets the PreviousMode field in the thread object of the caller to UserMode", then why is it that modifying PreviousMode from inside one syscall leads to subsequent syscalls being able to abuse PreviousMode? Shouldn’t it be reset on the entry of the next syscall? We decided to analyse why and found some interesting results.

We will start by looking at Windows 7 x64.

We identify which function is used as the entry point for syscalls by checking the MSR_LSTAR value in WinDbg:

0: kd> rdmsr 0xC0000082
msr[c0000082] = fffff800`029d4bc0
0: kd> u fffff800`029d4bc0
nt!KiSystemCall64Shadow:
fffff800`029d4bc0 0f01f8          swapgs
...

So we started reverse engineering KiSystemCall64Shadow and different labels/functions it calls into. Note that we generally look at the assembly instead of the decompiled code as this kind of function has been written in assembly anyway so is easier to read without using Hex-Rays.

One interesting thing we noted is that at no time does it actually set PreviousMode on entry. There is, however, a check in KiSystemServiceExit of the saved CS register that dictates if it should restore a saved PreviousMode value on exit from one of the functions:

.text:00000001400A19DB KiSystemServiceExit:          ; CODE XREF: KiSystemCall64+6B0↓j
.text:00000001400A19DB                               ; KiSystemCall64+6BB↓j
.text:00000001400A19DB                               ; DATA XREF: KiCallUserMode+25A↑o
.text:00000001400A19DB                               ; KiSystemServiceHandler+48↑o
.text:00000001400A19DB      mov     rbx, [rbp+0C0h]
.text:00000001400A19E2      mov     rdi, [rbp+0C8h]
.text:00000001400A19E9      mov     rsi, [rbp+0D0h]
.text:00000001400A19F0      mov     r11, gs:188h
.text:00000001400A19F9      test    byte ptr [rbp+0F0h], 1 ; Test if bit 1 of _KTRAP_FRAME.SegCs is set
.text:00000001400A1A00      jz      b_swap_previousmode_ret
...
.text:00000001400A1BA3      swapgs
.text:00000001400A1BA6      sysret                  ; return back to userland
...
.text:00000001400A1BA9 b_swap_previousmode_ret:      ; CODE XREF: KiSystemCall64+480↑j
.text:00000001400A1BA9      mov     rdx, [rbp+0B8h]
.text:00000001400A1BB0      mov     [r11+1D8h], rdx
.text:00000001400A1BB7      mov     dl, [rbp-58h]    ; _KTRAP_FRAME.PreviousMode
.text:00000001400A1BBA      mov     [r11+1F6h], dl   ; _KTHREAD.PreviousMode
.text:00000001400A1BC1      cli
.text:00000001400A1BC2      mov     rsp, rbp
.text:00000001400A1BC5      mov     rbp, [rbp+0D8h]
.text:00000001400A1BCC      mov     rsp, qword ptr [rsp+88h+anonymous_36]
.text:00000001400A1BD4      sti
.text:00000001400A1BD5      retn                    ; return back to kernel caller

On all of the versions of Windows we checked, for both 32-bit and 64-bit, the trap handler will use the lower bit of the saved CS selector (from KTRAP_FRAME SegCs) as a means of indicating if a caller into a trap was from userland or from the kernel. As seen above, only if the saved CS indicates a kernel caller will it reuse a previously saved PreviousMode. If the caller is from userland, this code will never be executed due to the sysret instruction making it return to userland.

If you are familiar with the Windows kernel’s Nt and Zw function prefix semantics, this should make sense to you. The difference above is hinted at by the use of the retn instruction when the caller is from the kernel, rather than sysret. The retn instruction implies that the return will not transition between privilege modes, but rather return to some other kernel function. This reflects the case where a kernel function calls a syscalls Zw wrapper function.

The Zw wrappers all jump into KiServiceInternal:

KiServiceInternal saves the old PreviousMode and sets the new one to KernelMode. This allows the kernel calling into syscalls to avoid expensive security checks enforced against userland:

.text:00000001400A1500 KiServiceInternal proc near
.text:00000001400A1500
.text:00000001400A1500     sub     rsp, 8
.text:00000001400A1504     push    rbp
.text:00000001400A1505     sub     rsp, 158h                ; _KTRAP_FRAME
.text:00000001400A150C     lea     rbp, [rsp+80h]           ; Offset into _KTRAP_FRAME
.text:00000001400A1514     mov     [rbp+0E8h+var_28], rbx
.text:00000001400A151B     mov     [rbp+0E8h+var_20], rdi
.text:00000001400A1522     mov     [rbp+0E8h+var_18], rsi
.text:00000001400A1529     sti
.text:00000001400A152A     mov     rbx, gs:188h
.text:00000001400A1533     prefetchw byte ptr [rbx+1D8h]
.text:00000001400A153A     movzx   edi, byte ptr [rbx+1F6h] ; Fetch old _KTHREAD.PreviousMode value
.text:00000001400A1541     mov     [rbp-58h], dil  ; Preserve in _KTRAP_FRAME.PreviousMode
.text:00000001400A1545     mov     byte ptr [rbx+1F6h], 0   ; Override with KernelMode as caller was ZwXXX
.text:00000001400A154C     mov     r10, [rbx+1D8h]
.text:00000001400A1553     mov     [rbp+0E8h+var_30], r10
.text:00000001400A155A     lea     r11, KiSystemServiceStart
.text:00000001400A1561     jmp     r11                      ; Continue syscall as normal

In the code excerpt above, rbp-58h corresponds to the same rbp-58h used earlier in the code labeled b_swap_previousmode_ret to restore PreviousMode when exiting the syscall (without transitioning privilege mode).

It is fairly easy to understand what rbp is by looking at the KTRAP_FRAME structure below. If we assume that after sub rsp, 158h executes, rsp points to KTRAP_FRAME, then rbp should be pointing to the Xmm1 field (lea rbp, [rsp+80h]). Then, we do relative variable references from there, so rbp-58h is really rsp+28h, which is PreviousMode:

//0x190 bytes (sizeof)
struct _KTRAP_FRAME
{
    ULONGLONG P1Home;                                                       //0x0
    ULONGLONG P2Home;                                                       //0x8
    ULONGLONG P3Home;                                                       //0x10
    ULONGLONG P4Home;                                                       //0x18
    ULONGLONG P5;                                                           //0x20
    CHAR PreviousMode;                                                      //0x28
    ...
    struct _M128A Xmm1;                                                     //0x80

The main take away from all of this is that on all the 64-bit Windows versions we looked at, the kernel appears to always assume that the _KTHREAD tracking a userland thread doing a syscall has a PreviousMode value of UserMode and will not bother trying to preserve it. It only bothers actually preserving PreviousMode if a Zw-based function path is taken (as it temporarilly overrides it with KernelMode to avoid certain checks). This means that if you ever get the ability to change the PreviousMode of a userland _KTHREAD to KernelMode, it will never change the PreviousMode field back to UserMode for that thread. This behavior does not match what the documentation indicates should happen.

In light of this, the primitive is quite powerful on 64-bit. This failure to properly set PreviousMode is not a vulnerability per se, as it works as intended in normal circumstances, but it seems like a highly abusable oversight in the kernel that could be changed.

PreviousMode on 32-bit

After leveraging the PreviousMode trick on 64-bit, we took a look at 32-bit. We found that it doesn’t work on 32-bit, because 32-bit behaves as the Microsoft documentation suggests it should. Let’s take a look.

On Windows 7 32-bit, the entry to the syscall routine is checked in WinDbg with the !idt command.

1: kd> !idt 2e
2e:	82a3f6be nt!KiSystemService

If we take a look at this function in ntkrnlpa.exe, we immediately see something different:

.text:0043D6BE _KiSystemService
.text:0043D6BE
.text:0043D6BE     push    0
.text:0043D6C0     push    ebp
.text:0043D6C1     push    ebx
.text:0043D6C2     push    esi
.text:0043D6C3     push    edi
.text:0043D6C4     push    fs
.text:0043D6C6     mov     ebx, 30h ; '0'
.text:0043D6CB     mov     fs, bx
.text:0043D6CE     mov     ebx, 23h ; '#'
.text:0043D6D3     mov     ds, ebx
.text:0043D6D5     mov     es, ebx
.text:0043D6D7     mov     esi, large fs:124h
.text:0043D6DE     push    large dword ptr fs:0
.text:0043D6E5     mov     large dword ptr fs:0, 0FFFFFFFFh
.text:0043D6F0     push    dword ptr [esi+13Ah]     ; Save old _KTHREAD.PreviousMode
.text:0043D6F6     sub     esp, 48h                 ; _KTRAP_FRAME
.text:0043D6F9     mov     ebx, [esp+6Ch]           ; _KTRAP_FRAME.SegCS value
.text:0043D6FD     and     ebx, 1                   ; Lower bit of CS selector
.text:0043D700     mov     [esi+13Ah], bl           ; Override _KTHREAD.PreviousMode using CS

One of the first things the function does is save the old _KTHREAD PreviousMode. It then sets the PreviousMode that will be used while inside the syscall to a value based on the CS segment register check. Since the CS selector cannot be changed from userland, this will always indicate to the kernel that the entry came from userland. As a result, even if we used a write primitive to change the PreviousMode value, upon syscall entry the modified value would be saved, and a safe value indicating userland would actually be used.

As we saw earlier when discussing the 0 value write primitive, the functionality differs between Windows 7 and Windows 8 or above, so we also checked for the PreviousMode logic. We confirmed the same CS-based check exists on Windows 8 x86 and Windows 10 1809 x86. This means this technique cannot easily be used on 32-bit Windows. We say "easily" because we believe it is still technically possible to abuse PreviousMode, albeit with greater difficulty.

One way we can still abuse PreviousMode on 32-bit would be to use one kernel write primitive (like the 0 value overwrite primitive) in a loop to constantly reset the PreviousMode of another _KTHREAD to 0. This target thread could in turn keep looping on a call to NtReadVirtualMemory() to read some kernel memory address until the read actually works. You end up having to exploit a fabricated race condition for the PreviousMode trick to work in the first place.

One drawback of this approach is that you need to be able to write to a _KTHREAD whose address you know, which is not the _KTHREAD that you are using in order to achieve the write primitive. This means you need to leverage the increment-based read/write primitives to find the target address for the 0 value overwrite primitive. Yet another drawback is related to a requirement we saw for hitting the write 0 primitive. If you want to use a single fake userland enlistment that is in an infinite loop writing 0 to some address, then this userland enlistment must have the KENLISTMENT_IS_NOTIFIABLE flag set. We also know that the kernel will unset this flag each time the loop is executed. To counter this problem, we must have a userland thread constantly resetting the KENLISTMENT_IS_NOTIFIABLE flag. These drawbacks make this approach fairly inconvenient to implement.

In the case of the CVE-2018-8611 vulnerability, the _KTHREAD we naturally leak is the one of the recovery thread that is stuck inside TmRecoverResourceManager(). We would therefore need to still rely on the increment primitive in order to build an arbitrary kernel read primitive, which would then allow us to find another _KTHREAD to target, to which we would write the PreviousMode value. It is also worth noting that if we tried to set the PreviousMode using the increment primitive (keeping in mind its requirements explained in part 4), it would become significantly more difficult to win this secondary race due to the number of increments required to wrap the value to 0 for each successful win. Actually implementing such a technique for e.g. Windows 10 1809 x86 is left as an exercise for the reader.

The following table shows the status of the PreviousMode trick and the ability to use the 0 value overwrite primitive across Windows versions. Please note that this may be approximate as not every version in between supported Windows versions was tested.

Windows versions	Arch	Increment primitive	0 value overwrite primitive	PreviousMode permanent?	PreviousMode usage
Vista to 7	x86	Yes	No	No	Raceable only with increment primitive
8 to 10 1809	x86	Yes	Yes	No	Raceable with 0 value overwrite primitive
Vista	x64	Yes	No	Yes	Direct
7 to 10 1809	x64	Yes	Yes	Yes	Direct

Mitigations

On page 73 of the Kaspersky presentation, they suggests it may be worth using a secret cookie for encoding the PreviousMode value or similar. This is interesting in so far as our increment primitive shows that we can escalate privileges using CVE-2018-8661 without abusing the PreviousMode value at all. So in the case of this vulnerability, that mitigation would not have been effective. A better and easier mitigation would be to explicitly set the PreviousMode to UserMode on a syscall trap from userland, as per their own documentation.

We agree with the other suggested mitigation. Futher hardening of kernel dispatcher objects could help close these primitives. The most effective way to prevent these types of problems however is to start using a mitigation like SMAP.

SMAP

This vulnerability would be very hard to exploit if SMAP was in use and the whole approach of pointing to a userland _KENLISTMENT would have failed from the beginning. As a reminder, both the in-the-wild and our approaches use this technique, even if we use different write primitives.

SMAP would mean that a prerequisite to exploitation would be that we already have the ability to introduce fully controlled data at some known location in kernel memory. On earlier versions of Windows this was possible by abusing the Desktop Heap, which is heavily abused by win32k exploits. Unfortunately for exploit developers, to our knowledge there are no publicly known public ways to leak the kernel address of the Desktop Heap in the latest versions of Windows without having a kernel read primitive.

Conclusions

We did most of this work without seeing the in-the-wild exploit for this vulnerability. This resulted in a pretty interesting and long research experience to get a working exploit on all versions, as well as a lot of time spent exploring various failed approaches. It also forced us to improve our tooling such as being able to heavily document our idbs in HexRays.

It may still be that the approaches we took and the 0day exploit took are not necessarily the very best ways to do it, but as with many things, you work with the ideas that come to mind. Hopefully you found this article informative!

Don’t hesitate to contact us:

Aaron Adams – aaron(dot)adams(at)nccgroup(dot)com – @fidgetingbits
Cedric Halbronn – cedric(dot)halbronn(at)nccgroup(dot)com – @saidelike

Future of KTM

It appears that Microsoft is considering deprecating KTM. The following quote is found on their page about Transactional NTFS (TxF):

Microsoft strongly recommends developers utilize alternative means to achieve
your application’s needs. Many scenarios that TxF was developed for can be
achieved through simpler and more readily available techniques. Furthermore,
TxF may not be available in future versions of Microsoft Windows. For more
information, and alternatives to TxF, please see Alternatives to using
Transactional NTFS: https://technet.microsoft.com/fr-fr/office/hh802690(v=vs.80)

It therefore seems likely that if more vulnerabilities start to be discovered in KTM, its removal will be expedited.

Shoutouts

Thanks to the readers who made it this far!

We would like to acknowledge that this work would be much more difficult without relying on the mountain of previous work done by the larger security community. Massive thanks to everyone that is willing to share their work and research with the rest of us.

Thanks to NCC Group for allowing us to do this type of interesting research as our day jobs. We would like to thank Nick Galloway for his review on the blogs of this series.

Thanks to the @poc_crew and to @offensive_con for letting us speak about it at POC2019 and OffensiveCon2020.

We want to give a big shoutout to Kaspersky for their great analysis and sharing the results of their findings, even if we happened to not come across them until 5 months later. We hope the readers have gained a greater appreciation for the beauty and nuance of exploiting non-trivial vulnerabilities.

Lastly, a random shoutout to whomever originally found and developed an exploit for this vulnerability! You found a really cool vulnerability, and presumably went through a similar maze while trying to figure out how to exploit it. We hope to look at the rest of your exploit some day to understand what other approaches we took that were different from yours!

Read all posts in the Exploiting Windows KTM series

CVE-2018-8611 Exploiting Windows KTM Part 1/5 – Introduction
CVE-2018-8611 Exploiting Windows KTM Part 2/5 – Patch analysis and basic triggering
CVE-2018-8611 Exploiting Windows KTM Part 3/5 – Triggering the race condition and debugging tricks
CVE-2018-8611 Exploiting Windows KTM Part 4/5 – From race win to kernel read and write primitive
CVE-2018-8611 Exploiting Windows KTM Part 5/5 – Vulnerability detection and a better read/write primitive

Aaron Adams