Hooking Non-Virtual Functions, Part 3

The important question last article asked, “How can this method totally fail?” The answer is quite simple: we’re copying six bytes, no matter what. We could be cutting an opcode off in the middle of the function. Observe:

0:   55                     push    ebp
1:   89 e5                  mov     ebp, esp
2:   81 7d 08 ff ff ff ff   cmp     [ebp+8], dword -1

If we copy only six bytes from this function:

  push ebp
  mov ebp, esp
.byte 0x81
  jge 0xe

That’s very, very bad. Unfortunately, x86 has variable length opcodes, making this very tricky to solve. There are three solutions I can think of. The first is to copy back the original code; this swapping is messy at best, and we lose re-entrancy. In fact, for this we could just go back to the original method in the first part of the article, using call (0xE8).

The second method would be to know the minimum copy size in advance. This is bad, as we’d have to hardcode something that’s clearly x86-only; most other processors (read: sensible RISC processors) have fixed-width opcodes.

The final, and most viable option, is to embed a tiny x86 opcode walker that can quickly determine the minimal copysize. This is not difficult, as long as you get all the cases. In fact, we only need a few: cmp, jmp, jcc, call, push, mov, lea, add, sub, and, ret, xor. It is very unlikely you will see another opcode early on, that is, in the first six bytes of the function. Decoding Mod/RM and SIB is trivial. An example of a quick, simple decoder is here. So, instead of saying “initial address + 6″ for the eip offset, and hardcoding six bytes, for the above funciton you would need to use exactly 10 bytes.

Now, to wrap up today’s article, we’ll do an stdcall implementation of the last code. Tomorrow we’ll do a hacky, but efficient variable argument version.

  push esi
  push edi
  mov ecx, 0 ;We will replace this with the number of bytes in the parameter stack
  lea esi, [esp+12] ;Get address of last stack frame
  lea edi, [esp-ecx] ;Point edi to bottom of the next call stack
  rep movsb
  pop edi
  pop esi
  sub esp, 0 ;Modify the stack frame to be N+8, again this will be copied
;The parameter stack has now been copied.  We can call our handler
;This is an E8 call, we'll replace the last four bytes with the address later
  db 0e8h, 000h, 000h, 000h, 000h
;'retn', 0xC2 for popping the stack AFTER the return.
;Copy in a 16byte value equal to the number of parameters (N/4).
  db 0c2h, 000h, 000h 

As you can see, the only difference is in how we’ve returned to the caller. We no longer adjust the stack, because the call has automatically cleaned it for us (assuming that the handler exactly mimics the function as an stdcall). Then we use

insead of
. Note that it would be more logical, and platform independent, to make the handlers cdecl/native rather than forced to a specific method. For this reason, it would be wiser to leave the original stack correction in (forcing the handler to cdecl). However, you can’t remove the
because the caller expects its parameters to be popped on return.

The close gate does not need to be changed. When you write the gate assembler, it is a good idea to make it smart enough to switch in between calling conventions with a parameter. This will save you lots of trouble, since MSVC and GCC tend to randomly use different conventions (MSVC more so).

Leave a Reply

You must be logged in to post a comment.