Archive for August, 2005

Don’t touch that stack pointer!

Tuesday, August 16th, 2005

A while ago I read a post by Raymond Chen about abusing the stack pointer. He equated it to “playing with fire”.

Every since AMX Mod X began using GWM Vissers’s Linux JIT for Small/AMX, we experienced random crashes on Linux. No one really knew why except that it seemed to be far more problematic depending on the pingboost level. Finally, about a week ago, Jussi Kivilinna noted that the JIT swapped the stack pointer. On Linux pthreads, he said, the stack pointer is used to determine the current thread id. Ouch.

Why does the JIT do this? Efficiency. The job of the JIT is to, as closely as possible, turn the virtual AMX machine into native execution. So logically it swaps the internal “AMX stack” for the real stack, then exchanges them again to return to the caller. However, there can be a lot of execution in between these swaps, so it’s quite conceivable these random crashes occurred because the OS was returning garbage or non-existant data for the stack pointer.

It took about 10-12 hours of editing and testing to fix this, at the loss of a bit of optimization (I’m sure more bugs will crop up before the next release). The lesson here is: Do not play with the stack pointer unless you have a very good reason! Whether or not the JIT was right to do this is another story (as it worked fine on Windows, and the JIT didn’t even support NASM until a few months ago), but the quirky behaviour of x86 anyway should deter people from doing it at all.

Whether this will actually stop the random crashing in Linux, only time will tell. (If you’re wondering how this is relevant, remember that SourceMod will have Small/Pawn support).

Hooking Non-Virtual Functions, Part 4

Thursday, August 11th, 2005

As a wrap-up to this article, I’ll show a very hacky variable-arguments version that, while untested, should work. The trouble with varargs is that we simply don’t know how many bytes were passed, and format routines simply rely on the number of ‘%’ characters to get parameters. This means the safest way, rather than to format things ourselves, is to hack the stack frame.

vararg_gate_open:
  ;push the old return value
  push dword [esp]
.stack_save
  ;call our 'stack save' function
  db 0e8h, 000h, 000h, 000h, 000h
  ;Note that the stack will now be at the first parameter
  sub esp, 8
;This is an E8 call, we'll replace the last four bytes with the address later
.call
  db 0e8h, 000h, 000h, 000h, 000h
;Now we have called the function with the original stack and returned
;Unfortunately, we've destroyed the return eip.  Restore it.
.stack_restore
  db 0e8h, 000h, 000h, 000h, 000h
  mov [esp], eax
  ret

What happened here? We’re taking the stack and realigning it, directly using it to call the handler without pushing any arguments. This is because we can’t copy, as we don’t know the byte size. However, calling it overwrites the old return eip on the stack, so we’ve saved it with two unimplemented functions. We can’t save the values on our own stack because, quite simply, we don’t have one — using the caller’s stack has destroyed our chance at having one, because the callee could corrupt our own. We could have used something callee-saved like edi, but we can’t save it ourselves. You can implement these two functions with a simple heap or data stack implementation.

Lastly, in case you were wondering how to write in an E8 call dynamically, you must use relative eips. Here is an example:

make_gate:
  ;..code..
  push myfunc.end-myfunc.start
  call malloc
  mov edi, eax
  mov esi, myfunc
  mov ecx, myfunc.end-myfunc.start
  ;Copy the 'stock' gate into our buffer
  rep movsb
  ;Move the pointer to the offset of the 4 bytes after E8
  add eax, myfunc.call - myfunc.start + 1
  mov edi, eax
  ;edi will be what eip is DURING the call
  add edi, 4
  ;the function we will be calling will now be in esi
  ;[ebp+8] is just an example of where you might be storing it
  mov esi, [ebp+8] 
  ;make the address relative to the eip
  sub esi, edi
  ;store the relative address into the call
  mov [eax], esi

myfunc:
.start
  ;...code...
  .call
  db 0e8h, 000h, 000h, 000h, 000h
  ;...code...
.end

Hooking Non-Virtual Functions, Part 3

Wednesday, August 10th, 2005

The important question last article asked, “How can this method totally fail?” The answer is quite simple: we’re copying six bytes, no matter what. We could be cutting an opcode off in the middle of the function. Observe:

0:   55                     push    ebp
1:   89 e5                  mov     ebp, esp
2:   81 7d 08 ff ff ff ff   cmp     [ebp+8], dword -1

If we copy only six bytes from this function:

  push ebp
  mov ebp, esp
.byte 0x81
  jge 0xe

That’s very, very bad. Unfortunately, x86 has variable length opcodes, making this very tricky to solve. There are three solutions I can think of. The first is to copy back the original code; this swapping is messy at best, and we lose re-entrancy. In fact, for this we could just go back to the original method in the first part of the article, using call (0xE8).

The second method would be to know the minimum copy size in advance. This is bad, as we’d have to hardcode something that’s clearly x86-only; most other processors (read: sensible RISC processors) have fixed-width opcodes.

The final, and most viable option, is to embed a tiny x86 opcode walker that can quickly determine the minimal copysize. This is not difficult, as long as you get all the cases. In fact, we only need a few: cmp, jmp, jcc, call, push, mov, lea, add, sub, and, ret, xor. It is very unlikely you will see another opcode early on, that is, in the first six bytes of the function. Decoding Mod/RM and SIB is trivial. An example of a quick, simple decoder is here. So, instead of saying “initial address + 6″ for the eip offset, and hardcoding six bytes, for the above funciton you would need to use exactly 10 bytes.

Now, to wrap up today’s article, we’ll do an stdcall implementation of the last code. Tomorrow we’ll do a hacky, but efficient variable argument version.

stdcall_gate_open:
  push esi
  push edi
.params1
  mov ecx, 0 ;We will replace this with the number of bytes in the parameter stack
  lea esi, [esp+12] ;Get address of last stack frame
  lea edi, [esp-ecx] ;Point edi to bottom of the next call stack
  rep movsb
  pop edi
  pop esi
.params2
  sub esp, 0 ;Modify the stack frame to be N+8, again this will be copied
;The parameter stack has now been copied.  We can call our handler
;This is an E8 call, we'll replace the last four bytes with the address later
.call
  db 0e8h, 000h, 000h, 000h, 000h
;'retn', 0xC2 for popping the stack AFTER the return.
;Copy in a 16byte value equal to the number of parameters (N/4).
  db 0c2h, 000h, 000h 

As you can see, the only difference is in how we’ve returned to the caller. We no longer adjust the stack, because the call has automatically cleaned it for us (assuming that the handler exactly mimics the function as an stdcall). Then we use

retn
insead of
ret
. Note that it would be more logical, and platform independent, to make the handlers cdecl/native rather than forced to a specific method. For this reason, it would be wiser to leave the original stack correction in (forcing the handler to cdecl). However, you can’t remove the
retn
because the caller expects its parameters to be popped on return.

The close gate does not need to be changed. When you write the gate assembler, it is a good idea to make it smart enough to switch in between calling conventions with a parameter. This will save you lots of trouble, since MSVC and GCC tend to randomly use different conventions (MSVC more so).

Hooking Non-Virtual Functions, Part 2

Tuesday, August 9th, 2005

The second method (this one PM gave me the idea for) for hooking non-virtual functions is with the jmp operator, rather than a call. This has a few immediate bonuses: the stack frame is left in tact, and we have more control over program flow. In other terms, we gain re-entrancy, and we don’t have to keep repatching code. In this article we’ll work on a generic hook, rather than a hardcoded one.

For beginners, let’s say we disassemble any function. In the HL2SDK for Win32, most non-virtual member fuctions will look like this (compiled with MSVC):

push ebx
push esi
push edi
...

Generally, the first four-five bytes of the function are simple pushes, and unless the function is less than five bytes, we won’t have a problem caching and overwriting the old code. So, let’s begin.

First, let’s make an imaginary data structure to hold all the information we’ll need for this. We’ll need to save the first six bytes of the function (that is, the amount of bytes a far, mem32 jmp takes). We’ll also need the eip to return to (function addr + 6), the function address, the calling convention, the stack size for the parameter list, and a few more little things we’ll get to. Since MSVC + HL2SDK compiles thiscalls with the __stdcall calling convention, I’ll demonstrate both GCC and MSVC options. Note you’d need a third to deal with varargs (maybe this will be a 3rd article).

enum CallConvention
{
   Call_cdecl=0,
   Call_stdcall=1,
};
//You must call mprotect on the address beforehand
//Returns a block of memory identifying the hook
//Pass a static address that contains the address of the function
void *hook_function(void *function, void **handler, int stacskize, int calltype);

Now we get to the assembly. As always, we are going to have at least two functions. The basic function gate that is never called, and the gate assembler that copies the basic gate into memory, then alters it.

cdecl_gate_open:
  push esi
  push edi
.params1
  mov ecx, 0 ;We will replace this with the number of bytes in the parameter stack
  lea esi, [esp+12] ;Get address of last stack frame
  lea edi, [esp-ecx] ;Point edi to bottom of the next call stack
  rep movsb
  pop edi
  pop esi
.params2
  sub esp, 0 ;Modify the stack frame to be N+8, again this will be copied
;The parameter stack has now been copied.  We can call our handler
;This is an E8 call, we'll replace the last four bytes with the address later
.call
  db 0e8h, 000h, 000h, 000h, 000h 
  add esp, 0; Another copy, for restoring the stack frame to N+8
  ret

That was easy enough. Now all hook_function has to do is copy correct values into the function. How to call the original, however? We need another, quick function assembled in memory.

cdecl_gate_close:
   ;Copy exactly the first six bytes of the original function into here
   times 6 db 0
   ;Construct a jump call.  Fill it with an address containing the eip (function addr + 6).
   db 0FFh, 025h, 000h, 000h, 000h, 000h

And how to keep track of all this information? Simple, we allocate a nice looking struct:

struct callgate
{
   void *function;
   void *orig_eip;
   void *gate_open;
   void *gate_close;
   int call_style;
   int num_params;
};

What’s going on here? We’re intercepting the function by telling it to jump somewhere else. That other location pushes the stack parameters, calls the handler, then returns. The second function simply executes the original 6 bytes, then jumps back to the original position.

If you get a little more creative, you can include handler lists (i.e., plugins) and a system for overriding return values. Tomorrow I’ll wrap this up with stdcall versions. To anyone actually reading this, you can make the actual gate creation function as an exercise. And, lastly, see if you can find how this will utterly fail! This will also be corrected in the next article.

HINT to the first question – All you need to do is allocate memory for those two code blocks, then overwrite the first six bytes of the target function with 0xFF 0×25 <4 bytes>, where the 4bytes hold the address of the callgate::gate_open pointer.

Hooking Non-Virtual Functions

Sunday, August 7th, 2005

While writing CS:S DM, I had to hook a non-virtual function in the HL2 engine. Unfortunately, SourceMM isn’t so magical that it can do this. I decided to try a first attempt on my own; cautiously, I’d never even done manual vtable hooking before.

The concept for virtual functions is easy and straight forward. Find the offset in the virtual function table and overwrite it with a “call gate”. The call gate calls a different function which decides whether to call the original. This is very easy since only one entry point needs to be modified.

However, non-virtual functions are a pain. They’re static blocks of code, and the only viable way to scan for their references is by walking the code section and calculating possible eip values. So, you have to edit the code.

The solution I came up with was quite hardcoded to both the calling convention and the actual function in question. But basically, it goes like this:

  1. Identify the address of the function you want to hook.
  2. Assembly a “call gate” in memory. This call gate should look something like:
    push ebp
    mov ebp, esp
    mov eax, [ebp+8]
    mov ecx, &lt;address>
    call [ecx]
    pop ebp
    retn

    Since you assemble this code in memory, you can move the address of your call handler into the code at runtime, by calculating the offset of “mov, ecx [ASDF]“.
  3. Assembly a “stub gate” in memory, and save it. This would look something like:
    push esp ;push the stack pointer so we can get parameters
    call callgate
    ret
  4. Calculate the length of the stubgate. Save this many bytes of the original function’s code.
  5. Copy the stubgate over the original code. Don’t forget a call to page-aligned mprotect() or VirtualQuery()!
  6. Create another function called “force_original”. This function should unpatch the original function by copying the saved bytes, call it, then restore the stub gate. Obviously, this is not re-entrant at all and a serious flaw.
  7. Return, return, return.
  8. What has happened? The original function was edited to call a gateway function. This gateway function takes the original stack pointer and gives it to your handler. Your handler decides whether to force the original call (which requires two repatchings), and then returns to the caller.
  9. Using this you can make a hacky, hardcoded, and awkward callgate for a non-virtual, static function. I have an example for hooking a four parameter, void function on linux here: dropgate.asm

    I’ll make a “Part 2″ post later: how to do this with a much nicer reentrant jmp. Using this method, combined with a system of specifying stack width and calling convention, SourceMM might have non-virtual function hooking in the future. Mmmm!