Beginner Tutorial #8
By: Shub-Nigurrath /ARTeam
http://cracking.accessroot.com

Breakpoints Theory v1.41

The Target:
none
The Tools:
Ollydbg 1.10
The Protection:
none

Other Information:
This tutorial will cover some basic concepts about breakpoints and their differencies, helping to cover also this aspect for beginners..

Best viewed in Firefox at 1280x1024

Introduction:

Well' once since the first tutorial of the beginners series you learnt that there are different breakpoints and that the can be used in different ways, but no one till now (in this series of tutorials of course) told  you what are the differencies among all the breakpoints types we can set. Even if you barely know what happens under the hood of breakpoints, it would suffice to understand why there are different usages for each of them.

First of all there's a big distinction between the breakpoints, there are Software Breakpoints and Hardware Breakpoints...

 



Hardware Breakpoints:

The latter ones, Hardware Breakpoints, are directly supported by the CPU, using some special registers, called debug registers.

There are four debug registers: DR0, DR1, DR2, DR3. They store the linear addresses of four breakpoints. The break conditions of each of these breakpoints are inside a special CPU register, the DR7 register. When any of these conditions are TRUE, the processor throws an INT 1 exception and the control is passed to the debugger. There are four possible breaking conditions foreseen by the CPU:

  1. An instruction is executed
  2. The contents of a memory location is modified
  3. A memory location is read or updated, but not executed
  4. An input-output port is referenced
     

Once you set an Hardware Breakpoint, the debugger will check to see if the trap bit of the flags register is set. If so, an INT 1 debug exception is generated automatically by the CPU, and the control is passed to the debugger, or generally speaking the exception handler registered in the system.

Considerations
Since these few details you can see that there is a maximum of four hardware breakpoints you can set and that the number of possible conditions is limited by design.

The advantage on the other hand is that the Hardware Breakpoints are almost undetectable by the software, the only way they can use to detect that an hardware breakpoint is set is to read the DR0..DR7 values: a code could detect tracing (debugging) by analizing the flags register (DR7). Unfortunately, it is only possible to work with these registers in ring0.

For the meaning of flags of DR7 and generally for all concerns Debug Registers, read the Section “Debug Registers” in Chapter 15 of the “IA-32 Intel Architecture Software Developer’s Manual, Volume 3”,  available at http://developer.intel.com/design/pentium4/manuals/253668.htm 

As such, we can use some tricks for switching over into ring0. For example I'm reporting here an example taken from the Pavol Cerven's book "Crackproof your Software" (excellent reading I suggest to all of you).

.386
.MODEL FLAT,STDCALL locals
jumps
UNICODE=0
include w32.inc

Extrn SetUnhandledExceptionFilter : PROC
Interrupt equ 5                              ;the interrupt numbers 1 or 3 will make
                                             ;debugging more difficult
.DATA

message1 db "Debug breakpoint detection",0
message2 db "Debug breakpoint not found",0
message3 db "Debug breakpoint found",0
delayESP dd 0                                ;the ESP register saves here
previous dd 0                                ;the ESP register will save the address
                                             ;of the previous SEH service here

.CODE
Start:
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
;Sets SEH in case of an error
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
     mov [delayESP], esp
     push offset error
     call SetUnhandledExceptionFilter
     mov [previous], eax
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

     push edx
     sidt [delayesp−2]                      ;reads IDT into the stack
     pop edx
     add edx, (Interrupt*8)+4               ;reads the vector of the required interrupt
     mov ebx,[edx]
     mov bx,word ptr [edx−4]                ;reads the address of the old service of the
                                            ;required interrupt
     lea edi,InterruptHandler
     mov [edx−4],di
     ror edi,16                             ;sets the new interrupt service
     mov [edx+2],di
     push ds                                ;saves registers for security
     push es
     int Interrupt                          ;jumps into Ring0 (a newly defined INT 5h service)
     pop es                                 ;restores the registers
     pop ds
     mov [edx−4],bx                         ;sets the original INT 5h interrupt service
     ror ebx,16
     mov [edx+2],bx
     push eax                               ;saves the return value

;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
;Sets the previous SEH service
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
     push dword ptr [previous]
     call SetUnhandledExceptionFilter
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

     pop eax                                ;restores the return value
     test eax,eax                           ;tests to see if eax=0
     jnz jump                               ;if not, the program has found a debug
                                            ;breakpoint and it ends

continue:
     call MessageBoxA,0, offset message2,\
     offset message1,0
     call ExitProcess, −1

jump:
     call MessageBoxA,0, offset message3,\
     offset message1,0
     call ExitProcess, −1

error:                                      ;sets a new SEH service if there is an error
     mov esp, [delayESP]
     push offset continue
     ret

;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
;Your new service INT 5h (runs in Ring0)
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
InterruptHandler:
     mov eax, dr0                          ;reads a value from the DR0 debug register
     test ax,ax                            ;tests to see if a breakpoint was set
     jnz Debug_Breakpoint                  ;if so, the program jumps
     mov eax,dr1                           ;reads a value from the DR1 debug register
     test ax,ax                            ;tests to see if a breakpoint was set
     jnz Debug_Breakpoint                  ;if so, the program jumps
     mov eax,dr2                           ;reads a value from the DR2 debug register
     test ax,ax                            ;tests to see if a breakpoint was set
     jnz Debug_Breakpoint                  ;if so, the program jumps
     mov eax,dr3                           ;reads a value from the DR3 debug register
     test ax,ax                            ;tests to see if a breakpoint was set
     jnz Debug_Breakpoint                  ;if so, the program jumps
     iretd                                 ;if a breakpoint was not set the program will
                                           ;return 0 into eax

Debug_Breakpoint:
     mov eax,1                             ;sets the value 1 into eax to show that
                                           ;breakpoints are active
     iretd                                 ;jump back into Ring3

ends
end Start

This technique is one of the few ways to discover debug breakpoints, and it makes possible to delete them without stopping the application in the debugger. However, rather than delete them, usually the application goes to an incorrect ending. Unfortunately, the trick (and all the other similar ones) works only in Windows 9x because of the need to switch over into ring0.

Generally speaking the Cerven's book mentioned describes three ways to switch a normal program running in ring3 into ring0, but only in Windows 9x. Windows NT, 2000, and XP systems were secured against these methods because of the prevalence of viruses that take advantage of them. (Older Windows NT versions did allow this switch, but after it was misused a few times, the possibility was removed from the system).
One still good way to implement these tests in Windows NT, Windows 2000, and Windows XP is to place them into a Sys driver running in ring0.


Update: changing debug registers in ring3 code
According to experiments it's not completely true that you have to switch to ring0 to change the debug registers value. Programs running as debuggers, can do calls to system API's such as GetThreadContext() and SetThreadContext(). These APIs will be executed by the NTDLL.DLL becoming so a system call (an interrupt 2E) causing then the processor to switch to ring0 and run the code.

You can also experiment on your own using the following simple ASM code (also compiled and included into this archive) which uses the SEH mechanism to erase the debug registers and then return to the normal excution. See for example the code here attached, (many thanks a lot to Neitsa for writing it). The structure is very simple, try to place an hardware breakpoint in one of the NOP in the example and then place a normal BP into the first instruction of the SEH handler, and follow what happens.

.686
.model flat, stdcall ;32 bit memory model
option casemap :none ;case sensitive
assume fs:nothing ;MASM feature (otherwise FS assumed to be ERROR)
include EraseDrx.Inc

.code

start:
     ; ### set the S.E.H ###
     push offset mySEH
     push dword ptr fs:[0]
     mov dword ptr fs:[0],esp

     ;*** now everything will be covered by our SEH ***
     ; raise an invalid opcode exception
     UD2

@@SafeOffset: ; this is where we can safely return from our SEH
     fnop

     ;try to hardware BP one of those NOP
     nop
     nop
     nop
     nop

     ;*** now this is this end of the SEH ***
     pop dword ptr fs:[0]
     add esp,4
     ret ;return to ExitThread

;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
; OUR SEH handler, which erases the debug registers
;−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
mySEH proc C lpExcept:DWORD, lpFrame:DWORD, lpContext:DWORD, lpDispatch:DWORD

     mov ecx,[lpContext]

     ; push all linear addresses of drx (from Dr0 to Dr3)
     ; you should see your hardware BP there (just to demonstrate where they are)
     push [ecx][CONTEXT.iDr0]
     push [ecx][CONTEXT.iDr1]
     push [ecx][CONTEXT.iDr2]
     push [ecx][CONTEXT.iDr3]
     add esp,4*4 ; skip them

     ;erase DR0 to DR3
     push 0
     push 0
     push 0
     push 0

     pop [ecx][CONTEXT.iDr0]
     pop [ecx][CONTEXT.iDr1]
     pop [ecx][CONTEXT.iDr2]
     pop [ecx][CONTEXT.iDr3]

     ;erase also DR7
     push 0
     pop [ecx][CONTEXT.iDr7]

     ;now set EIP to our SafeOffset
     push offset @@SafeOffset
     pop [ecx][CONTEXT.regEip]

     mov eax,FALSE
     ret

mySEH endp

end start


Concluding this section, then theoretically in order for a debugger to be invisible it needs to recognize the instructions for reading the flags register, emulate their execution and return always zero as value of the trap flags. Not that easy to be done indeed!

Using Hardware Breakpoints in Olly:
Now a question comes, when it's useful to use HW breakpoints with Olly? The real answer should be evident reading also the following section on SW breakpoints, but anyway generally the HW breakpoints can be used in two general cases:

  • the debugged program checks against modifications of its own code (self checking program)
  • the debugged program checks against debugger presence and erase the breakpoints found into its code (which also involves self-checking anti-tampering techniques to be performed).

This things usually happens with compressors or complex packers such as Armadillo or Execryptor or AsProtect.

A second advantage of HW breakpoints is that given that are a property of the CPU are not related to a specific Olly instance, so it's easy to set an HW breakpoint to a specific memory address from one istance of Olly and then open another Olly istance of the same program and have also the second program to break at the address specified by the first Olly.

Well, seems complex., but it's indeed really useful: suppose you have a packed program, packed with AsProtect for example. You would open one istance of Olly (well hidden with all HideDebugger plugins) and run the target program. Suppose that during the Manual Unpacking you will need to stop at a specific memory address to see if something happened (indeed is a step of  the way to manually unpack AsProtect) or to change a registry value. In order to not interfere with AsProtect you would likely set an hardware breakpoint at a specific address.

Now once that HW Breakpoint is set try to reduce the first istance of Olly (do not close it) and open another one. Launch also this second istance of Olly with the target program and let it run freely. What happens? Well happens that the second Olly will break where the first Olly placed the HW breakpoint (a part from relocations of course)

This behaviour happens because the HW Breakpoints are directly handled by the CPU and it's the CPU that issues the debug event and are not associated to a specific process, but only to a specific memory address.

There's nothing much than this to say about Hardware breakpoints indeed..so let move on on the more complex software breakpoint ..

Well indeed this method is one of the trick used for example to handle some packers such as ActiveMark also.

 

 

Software Breakpoints:

A Software Breakpoint is the only type of breakpooints that cannot be hidden without writing a full-scale processor emulator. If you place this one byte of code -- 0xCC at the beginning of an instruction, it will cause an INT 0x3 exception when an attempt is made to execute it.

The handler of INT 0x3 gains control and can do whatever it wishes with a program. However, before the interrupt handler is called, the current values of the flags register, the pointer of the code segment (the CS register), and the instruction pointer (the IP register) are placed onto the stack. In addition, the interrupts are disabled (the IF flag is cleared), and the trap flag is cleared. Therefore, a call of the debug interrupt does not differ from a call of any other interrupt.

To learn the point of the program in which the halt has occurred, the debugger pulls the saved values of registers off the stack, taking into account that CS:IP points to the next instruction to be executed.

So generally it is complex to set a breakpoint in an arbitrary place of the program. The debugger should save the current value of the memory location at the specified address, then write the code 0xCC there. Before exiting the debug interrupt, the debugger should return everything to its former place, and should modify IP saved in the stack so that it points to the beginning of the restored instruction. (Otherwise, it points to its middle.)

Considerations
What are the drawbacks of the breakpoint mechanism of the 8086 processor? The most unpleasant is that the debugger must modify code directly when it sets the breakpoints. It should be self evident that modifying the memory of a process (the 0xCC is written) is something that a program can easily detect and avoid in different ways. There are plenty of tutorials describing more or less smart ways to avoid a program from being "breakpointed". Common actions are to alterate the program's flow consequently or simply rewrite the 0xCC byte with the original value (the debugger will not stop).

For a program bein debugger a possible solution to discover whether at least one point has been set, is to count its checksum. To do this, it may use MOV, MOVS, LODS, POP, CMP, CMPS, or any other instructions.

For example let's take a look to the following simple protection scheme (from Karsperky Book, Haker Disassembling Uncovered), using the XOR trick to decrypt a string.

int main(int argc, char* argv[])
{
    // The ciphered string "Hello, Free World!"
    char s0[]="\x0C\x21\x28\x28\x2B\x68\x64\x02\x36\
               \x21\x21\x64\x13\x2B\x36\x28\x20\x65\x49\x4E";
   
    __asm
    {
        BeginCode:                 ; The beginning of the code being debugged
            pusha                  ; All general-purpose registers are saved.
            lea ebx, s0            ; ebx=&s0[0]
            GetNextChar:           ; do
                xor eax, eax       ; eax = 0;
                lea esi, BeginCode ; esi = &BeginCode
                lea ecx, EndCode   ; The length of code
                sub ecx, esi       ; being debugged is computed.
                HarvestCRC:        ; do
                    lodsb          ; The next byte is loaded into al.
                    add eax, eax   ; The checksum is computed.
                loop HarvestCRC    ; until(--cx>0)
                xor [ebx], ah      ; The next character is decrypted.
                inc ebx            ; A pointer to the next character
                cmp [ebx], 0       ; Until the end of the string
            jnz GetNextChar        ; Continue decryption
            popa                   ; All registers are restored.
        EndCode:                   ; The end of the code being debugged
            nop                    ; A breakpoint is safe here.
    }
    printf(s0);                    //The string is diplayed.
    return 0;
}

 

After starting the program normally, the line "Hello, Free World!" should appear on the screen. But when the program is run under the debugger, even with at least one breakpoint set within the limits of BeginCode and EndCode, senseless garbage like "Jgnnm."Dpgg"Umpnf#0" will show up on the screen. Protection can be strengthened considerably if the procedure computing the checksum is placed into a separate thread engaged in another useful process, making the protective mechanism as unobtrusive as possible.

 
The above code uses a property of thre XOR operator you might have forgotten, which is A <XOR> B <XOR> A = B. That is why it's often used for weak data encoding. If you XOR a plaitext data with a key, you get "ciphertext" back. If you XOR the "ciphertext" with the key, you get the plaintext back. And if you know the ciphertext and the plaintext, you get the key back.


If the above case the key used is directly obtained from the code among the BeginCode and the EndCode addresses

Let see it directly into Olly (using the supplied main.exe into this archive).

You can find again what we just wrote in the C code above. Try excercising yourself with this protection, it's not so uncommon to find it in weak protected programs in real life...if you excercise here, you'll be ready to recognize it whenever you'll find it..

Of course the above sample is simple, consider that the code might be complicated using exception, thread and other amenities to complicate lifes of who, like us, likes to follow the ASM code..

Note for those of you who read the Shub-Nigurrath Oraculum's tutorial
For those of you which have read the ARTeam's Oraculum Tutorial you should also wonder that the "EBFE" trick used there is somehow similar to the memory breakpoints described here. Well the mechanism is similar just for the fact that it writes a value in memory in the place where we want the break to occour. The difference is that there's no exception raising and that the check to suspend the program is simply made looking at the EIP being constant.

This will overcome those protections only checking the exception status or the 0xCC value being present, but not those doing complex checksums on memory.

 

 

References:

As you can see there are different usages for breakpoints just because they are implemented differently, but consider that none of them is undetectable, thus do not rely, once a breakpoint is set, on your program to stop on it. Always be aware that the program can detect it's presence and delete it or modify itself behaviour to counteract an attack.

Combining these two tecniques lead to conceptually  simple routines useful to erase all the breakpoints placed in "sensitive" peices of code, either hardware and software breakpoints. 

I suggest the following further readings from now on to complete this argument..

  1. Kris Karspersky, Hacking Disassembling Uncovered, a-List Press
  2. Pavol Cerven, Crackproof your software, No Starch Press
  3. Shub-Nigurrath, Oraculum Tutorial With Framework Src V11, ARTeam
  4. Shub-Nigurrath, Gabri3l, Serial Fishing And Oraculum For Weblink, ARTeam
  5. Gabri3l, Writing A Loader 4 Softwrap 6.1.1, ARTeam
  6. IA-32 Intel Architecture Software Developer’s Manual, Volume 3, Intel, Section “Debug Registers”, Chapter 15, http://developer.intel.com/design/pentium4/manuals/253668.htm

and essentially all the tutorials seens around (also others on our tutorials page) which always make use of breakpoints..

 

 

Conclusion:

Thanks to the whole ARTeam:
[Nilrem] [JDog45] [Shub - Nigurrath] [MaDMAn_H3rCuL3s] [Ferrari] [Kruger] [Teerayoot] [R@dier] [ThunderPwr] [Eggi] [EJ12N]
[Stickman 373] [Bone Enterprise]

Thanks to all the people who take time to write tutorials.
Thanks to all the people who continue to develop better tools.
Thanks to Exetools, Woodmann, SND, TSRH, MP2K and all the others for being a great place of learning.
Thanks also to The Codebreakers Journal, and the Anticrack forum.

If you have any suggestions, comments or corrections contact me in usual places..