Friday, June 23, 2017

Loading and Debugging Windows Kernel Shellcodes with Windbg. Debugging DoublePulsar Shellcode

In this article i’d like to share a windbg script that will let us to load a shellcode from a file to kernel memory and create a kernel thread to execute it. I have not played a lot with the script yet, if you find some bug please tell me.

Windbg Script for loading the shellcode and creating a thread for running it

You can find the script in my github:


The parameters of the script:

  • $$>a<load_code_to_kernel_memory.wdbg <src code> <mem size> <offset start routine>

The first argument is the path to the file containing the shellcode. The second one is the size of the memory to reserve (enought for allocating the shellcode). The third parameter is the offset into the shellcode where we want the thread starts to execute.

Careful: The size of the file with the shellcode should be padded to fit a multiple of page size. We are using the command .readmem to load the shellcode, and it will read blocks of 0x1000 bytes. For example, if your shellcode file has 0x2800 bytes, .readmem will load 0x2000 bytes only. You will need to complete the file with 0x800 additional trash bytes to load the full code.

A quick explanation about the script

We will need to hijack a running thread for a while. We want to redirect that thread execution to ExAllocatePool to reserve memory for the shellcode (we will manipulate the stack of the hijacked thread to do that, and we will restore it later).

For this purpose, we will set a breakpoint at NtCreateFile (a frequently called function). When we have the thread stopped there, we can manipulate it:

$$set a breakpoint on a common function that is executed frequently (NtCreateFile for example) for hijacking the thread for a while
ba e1 nt!NtCreateFile
.printf “${$arg1}”
.printf “${$arg2}”
.printf “${$arg3}”
bc *
$$save original esp register and stack parameters that we are going to overwrite when calling ExAllocatePool and PsCreateSystemThread
r @$t19 = (poi esp)
r @$t18 = (poi esp+4)
r @$t17 = (poi esp+8)
r @$t16 = (poi esp+c)
r @$t15 = (poi esp+10)
r @$t14 = (poi esp+14)
r @$t13 = (poi esp+18)
r @$t12 = (poi esp+1c)
r @$t11 = esp
Once we have the thread, we will change eip and stack to execute ExAllocatePool:

$$change the stack with the parameters that we need for ExAllocatePool
ed (esp+4) 0
ed (esp+8) ${$arg2}
$$hijack the thread running on NtCreateFile to execute ExAllocatePool
u nt!ExAllocatePool
dd esp
r eip = nt!ExAllocatePool
$$steps until the ret instruction is found. We cant execute step out (gu) because we would get a 0x30 bugcheck, this is the reason:
$$ “This bug check occurs when there’s a stack underflow detected when restoring context coming out of a trap. There’s a check to
$$ validate that the current ESP is less than the ESP saved in the trap frame. If current_esp < saved_esp, bugcheck. ”
.while (1)
r @$t10 = (poi eip)
r @$t10 = @$t10 & 0x000000ff
.if (@$t10 == 0xc2)
r @$t0 = eax
.printf “allocated mem: %x\n”, @$t0
Now we have allocated enought space for the shellcode, so load it:

$$load code from file to allocated memory
$$careful: .readmem will read blocks of 0x1000 bytes. For example, if your file to load has 0x2800 bytes, .readmem will load 0x2000 bytes only.
$$You will need to complete the file with 0x800 additional trash bytes to load the full code
.readmem ${$arg1} @$t0
$$ @$t1 = allocated mem membase, code is read
r @$t1 = @$t0
And now, we want to create a thread on shellcode + arg3:

$$Now we are going to create a kernel thread at @$t1 + arg3 (membase + startroutine_offset)
$$ NTSTATUS PsCreateSystemThread(
$$ _Out_ PHANDLE ThreadHandle,
$$ _In_ ULONG DesiredAccess,
$$ _In_opt_ POBJECT_ATTRIBUTES ObjectAttributes,
$$ _In_opt_ HANDLE ProcessHandle,
$$ _Out_opt_ PCLIENT_ID ClientId,
$$ _In_ PKSTART_ROUTINE StartRoutine,
$$ _In_opt_ PVOID StartContext
$$ );
ed (esp+1c) 0
ed (esp+18) @$t1+${$arg3}
ed (esp+14) 0
ed (esp+10) 0
ed (esp+c) 0
ed (esp+8) 0
$$ThreadHandle inout, we use the memory of the parameter StartContext that we dont need
ed (esp+4) (esp+1c)
$$set a breakpoint where the thread is going to be created
ba e1 @$t1+${$arg3}
u nt!PsCreateSystemThread
dd esp
r eip = nt!PsCreateSystemThread
$$again steps until ret instruction is found
.while (1)
r @$t10 = (poi eip)
r @$t10 = @$t10 & 0x000000ff
.if (@$t10 == 0xc2)
Finally, we restore stack and eip to continue the execution of the hijacked thread correctly at NtCreateFile, or the system will crash:

$$restore original registers and stack to continue the execution with no problems
r eip = nt!NtCreateFile
r esp = @$t11
ed esp @$t19
ed (esp+4) @$t18
ed (esp+8) @$t17
ed (esp+c) @$t16
ed (esp+10) @$t15
ed (esp+14) @$t14
ed (esp+18) @$t13
ed (esp+1c) @$t12
After this, windbg should stop at the breakpoint in the offset of the shellcode that we wanted the thread to start.

Testing the script with DoublePulsar shellcode

We are going to test the script with a DoublePulsar Shellcode extracted from a worm/ransom whose name i don’t want to remember.

You can download the shellcode file from here (rar password: infected).

The size of the file is 0x3000. I have not reversed the shellcode in depth, but a good point for starting to debug seems to be the offset 0x221 (later we will see why).

We execute the script and here it is the printed debug traces:

kd> $$>a<load_code_to_kernel_memory.wdbg shellcode.bin 3000 221
Breakpoint 0 hit <- nt!NtCreateFile hit
8261e976 8bff mov edi,edi
8261e978 55 push ebp
8261e979 8bec mov ebp,esp
8261e97b 684e6f6e65 push 656E6F4Eh
8261e980 ff750c push dword ptr [ebp+0Ch]
8261e983 ff7508 push dword ptr [ebp+8]
8261e986 e87a461100 call nt!ExAllocatePoolWithTag (82733005)
8261e98b 5d pop ebp
9e867d04 826511ea 00000000 00003000 040bfb54 <- stack for ExAllocatePool
allocated mem: 849f4000 <- allocated memory
Reading 10000 bytes…… <- .readmem shellcode
8281bfb6 8bff mov edi,edi
8281bfb8 55 push ebp
8281bfb9 8bec mov ebp,esp
8281bfbb 83e4f8 and esp,0FFFFFFF8h
8281bfbe 83ec34 sub esp,34h
8281bfc1 a148da7382 mov eax,dword ptr [nt!__security_cookie (8273da48)]
8281bfc6 33c4 xor eax,esp
8281bfc8 89442430 mov dword ptr [esp+30h],eax
9e867d04 826511ea 9e867d20 00000000 00000000 <- stack for PsCreateSystemThread
9e867d14 00000000 00000000 849f4221 00000000
9e867d24 00000001 00000060 00000000 00000000
Breakpoint 0 hit <- shellcode hit
849f4221 b923000000 mov ecx,23h
0: kd> u eip
849f4221 b923000000 mov ecx,23h
849f4226 6a30 push 30h
849f4228 0fa1 pop fs
849f422a 8ed9 mov ds,cx
849f422c 8ec1 mov es,cx
849f422e 648b0d40000000 mov ecx,dword ptr fs:[40h]
849f4235 8b6104 mov esp,dword ptr [ecx+4]
849f4238 ff35fcffdfff push dword ptr ds:[0FFDFFFFCh]


We can see the new created thread is stopped at the point of the shellcode that we have specified:

At offset 0x20B we can see a function of the shellcode that is hooking sysenter_eip.

We can see the shellcode is using the address 0xFFDFFFFC to store MSR[0x176]. You can execute:

1: kd> dt nt!_kuser_shared_data ffdf0000

Nt!_kuser_shared_data structure is located at ffdf0000, so i guess the shellcode is using the free space in the page of Nt!_kuser_shared_data after this structure to store the temporal value that it needs.

Here you can read about hooking system calls through MSR (very interesting):

So the offset 0x221 is the hook for sysenter_eip, for this reason i think it is a good point to debug from here.

Let’s continue reversing the code of sysenter_eip hook:

In kernel-land, at fs:[0] we have the _KPCR structure. We can see how the shellcode gets some values that it needs from this structure and other structures pointed from here.
  • fs:0x40 -> _KTSS
  • _KTSS + 4 -> Esp0 (correct Esp for continue executing in kernel)
  • nt!kuser_shared_data+0x304 -> SystemCallReturn
  • _KPCR + 0x1C -> SelfPcr (_KPCR)
  • SelfPcr + 0x120 -> PrcbData (_KPRCB)
  • PrcbData+0x4 -> CurrentThread (_KTHREAD)
  • CurrentThread+0x28 -> InitialStack
After all these initializations, it calls the main code of the hook, but before, it restores the MSR[176] (SYSENTER_EIP). It already has a thread from user mode, and probably it is enought for its purposes.


I have not debugged very in depth the shellcode, but we will see the first part of the shellcode main code executed from the sysenter_eip the hook:

We can see how the shellcode is taking a pointer of the IDT to have a address into ntoskrnl.exe. In this way it can find the base of ntoskrnl.exe going back in the memory space until finding the PE header.

After that, it gets by CRC some APIs that it needs: ExAllocatePool, ExFreePool, ZwQuerySystemInformation.

Later, it uses ZwQuerySystemInformation to list all loaded modules, searching for kdcom.dll (antidebug trick?) and specially srv.sys:

When it finds srv.sys, it walks the PE sections of the module, trying to find something into the data.

This shellcode works with an SMB exploit. From my point of view now it is trying to find other parts of the data that the exploit sent (probably a PE to load).


I have not continued debugging the shellcode, because this was only to test the script and show how to debug a shellcode without needing to execute the full exploit (there is a lot of information about DoublePulsar on internet, for example:

Sometimes it is much faster to load the shellcode and debug than installing a vulnerabile machine, executing the exploit, etc…

I hope the script and the article help you in your reversing sessions šŸ˜„

No comments:

Post a Comment