Monday, June 20, 2011

Switching between user mode and kernel mode - Part2

In the previous post, we looked at how processor switches from User mode to kernel mode, how the desired System Service is picked up by kernel, which MSR stores the details reuqired by CPU to switch and how to verify those thing from Windbg.
Let's focus now on the way parameters required for the APi are passed to kernel mode from user mode and finally how the control moves back to user mode again.

The first four parameters are passed in rcx, rdx, r8 and r9 and rest are there on the stack. Before calling the corresponding System Service call, kernel restores all the register arguments and copies the in memory arguments from User mode to kernel, so that while the call is in progress user doesn't modify the actual arguments.

Once the system call is done, KiSystemCall64 restores the registers from Trap Frame and calls Sysret instruction, which finally is responsible for the switching to User mode.
Again from Intel Manual below are the steps performed by processor when it sees the Sysret instruction:

Target code segment — Reads a non-NULL selector from IA32_STAR[47:32].

Target instruction — Reads a 64-bit canonical address from IA32_LSTAR.

Stack segment — Computed by adding 8 to the value in IA32_STAR[47:32].

System flags — The processor sets RFLAGS to the logical-AND of its current value with the complement of the value in the IA32_FMASK MSR.

When SYSRET transfers control to 64-bit mode user code using REX.W, the processor gets the privilege level 3 target instruction and stack pointer from:

Target code segment — Reads a non-NULL selector from IA32_STAR[63:48] +16.

Target instruction — Copies the value in RCX into RIP.
Stack segment — IA32_STAR[63:48] + 8.
EFLAGS — Loaded from R11.

As explained in the previous post, you should be able to understand the above steps. Hope this post would be helpful to  you. Your comments are welcome....

Saturday, June 18, 2011

Switching between user mode and kernel mode.

Hello,
Today I would dig into something we very often listen in software development or debugging process.
That is different modes processor run into on windows OS. that is kernel mode and user mode.
We also say it ring0 and ring3 execution...
So there are many places when the OS switches from User Mode to Kernel Mode or from Lower Privilege level to high privilege level and vice versa. Examples are:
> Interrupts
> Exceptions
> System Calls

I would go in details with System calls today. We know that when we call a windows API from User mode, ntdll makes the transition to kernel mode API and after the kernel completes the function call results are returned back to User mode.
For our example let's take ntdll!NtReadFile. Any user mode read file operation would result in this API and subsequently transferred to Kernel mode. let's check what is inside this function. Below is the disassembly of this fucntion:
          mov     r10,rcx
          mov     eax,3
          syscall
          ret

You can notice the syscall instruction, this instruction is the one that makes a fast call to kernel mode. Just for the information, this output is from a X64 based PC.
X64 based CPU only support Syscall for 64bit mode and not in compatibility mode. 
So what processor does when it sees the Syscall instruction. Here are the steps directly from intel manual.
For Syscall the processor Saves the RFlags to R11 and RIP of the next instruction to RCX.
Now to run the kernel mode code:
the processor should have following piece of information:
Target Code Segment (CS)
Target Instruction (where the execution should start)
Stack Segment
and System Flags.
This all it gets from CPU specific Registers.
CS = IA32_STAR[47:32]
RIP = IA32_LSTAR (64bit Canonical address)
SS = IA32_STAR[47:32] +8
System Flags: processor sets the system flags to the logical AND of its current value witht the complement of the value in IA32_FMASK_MSR.

IA32_Star* basically are MSRs(Model Specific Registers) and you can check the value of those using rdmsr windbg command if you know the index of these registers(which you can easily find in the Processors's manuals). The index for IA32_star is C0000081 for example. C0000082 for IA32_LSTAR and C0000084 for IA32_FMASK.

when I checked on my box I found that RIP = IA32_LSTAR  this is the address of nt!KiSystemCall64.
So now you know where the call will go when Syscall instruction is executed. You should also notice that this is the kernel function called for any system call and not just ntdll!ntreadfile.

Once in kernel, the nt!nt!KiSystemCall64 decides what to do (which api to call based on the parameter it received from user mode. I will explain next where this parameter is coming from?)
Kernel stack is stored in the TSS of every task and is fetched from there. The User stack and RIP , RFlags etc are stored in the stack so that when the call returns to the user mode they can be restored.

When I said the argument is passed from user mode, to identify which system call will be invoked in kernel mode, that argument basically is set into Eax register right before the Syscall instruction.

mov r10,rcx
mov eax,3
syscall
ret

Then corresponding to this number 3 we would need the API from system service table. On 64bit OS windows keeps the offset of System Service calls, these offsets are 32 bits and are relative to nt!kIServiceTable. So if you have to find the address of a specific service routine. pick out the offset from the array nt!KiServiceTable and add that to the base address of nt!kiServiceTable.

Hang on, there is a slight change in my above statement. You should not directly add the offset to get the system service address. Rather there is a trick used by Kernel. Kernel actually uses only 28-bits to store the offset. Last 4 bits it uses for the number of arguments to that system call. So the actually what you should add is (Offset>>4)(i.e. remove the last bit that is used for storing the arguments.)
kd> dd nt!KiServiceTable l4

fffff800`014c7b00 04106900 02f6f000 fff72d00 031a0105

kd> ln nt!KiServiceTable+(031a0105>>4)


nt!NtReadFile (.....)


We can check here the user mode ReadFile ultimately goes to Kernel Mode nt!NTReadFile. Ok so what next. What if Nt!ntReadfile takes some arguments, where do they come from? How does kernel receive them from user mode? How does the control goes back to User mode once the systme call completes its job? We will look at all these details in the next post soon.







Thursday, June 9, 2011

Windows Structured Exception Handling

We are familiar with the exception handling syntax:
__try
{
}
__except(FilterFunction())
{
}

Let's start to find out what are the inner details of exceptions. How does windows carry out exception dispatching.

We are completely focusing on X86 exception dispatching. Amd64 has a little difference in the way exceptions are handled and I will try to cover them later.

As soon as the processor sees an exception during the execution of an instruction, it first checks which ring it is executing in(i.e. whether its user mode or kernel mode). lets consider first User Mode. if it is user mode it first switches to kernel mode stack. Kernel mode stack can be retrieved from the task state register of the task. However how processor switches back to kernel mode is not a topic of this blog.

So once the processor switches back to kernel mode it pushes some of the registers on the stack before it actually calls the interrupt handler(each type of interrupt/exception has a different interrup handler for example for divide by zero the interrupt handler is nt!kiTrap00).  These registers are:

Push SS //this is 23 for i386 User Mode
Push esp
Push Eflags
Push cs  //this is 1b for i386 User mode
Push eip
Push ErrorCode

The purpose of these register is to remember the state and instruction which caused the exception.
However these does not save complete state of the execution context, so right after saving this processor dispatches to interrupt handler. Interrupt handler saves a few other registers/addresses on the stack and this complete saved structure is called the trap frame.

The interrupt hadnler creates a EXCEPTION_RECORD and calls the kernel routine that dispatches the exceptions. This routine checks if a debugger was attached to the process then it sends a message to break in and this is what we call a first chance exception. If debugger handles the exception the interrupt handler uses the trap frame to resume the instruction that caused exception.

But if debugger doesn't handle the exception the kernel mode exception dispatching api needs to call user mode exception dispatcher which finally searches for all exception handlers registerd by program and operating system. However before dispatching to user mode exception dispatcher the kernel mode dispatcher copies the EXCEPTION_POINTER structure to user mode. Once this is done we are again in user mode with exception information already copied to user space and the control is at User mode exception dispatcher.

The user mode exception dispatcher function searches for all the exception handler in the call chain. If any of the handler provided by the program returns EXCEPTION_CONTINUE_EXECUTION. The execution is contiuned.
If the handler provided by the program returns EXCEPTION_EXECUTE_HANDLER then the code under the catch block is executed.

However if none of the program defined exception handler handles the exception then system provided UnhandledExceptionFilter is called. it again checks if the process is under debugger or not.
if it is, the debugger is sent again the break in command and this is what we know as second chance exception.

if the debugger is not attached this function searches for WER settings and if that is set the WER message is popped( the familiar message box saying the program stopped working)

If there is no WER, then UnhandledExceptionFilter returns the EXCEPTION_EXCEUTE_HANDLER which in turn the causes the very first catch block(provided by system) to execute and this simply terminates the process.