Reverse Engineering

Introduction   Windows Programs   Debuggers   Disassemblers   Final words

Introduction

Imagine, that you have written a programm. You were lazy and added only few comments. Now you have a quick look and try to understand your old code - and fail. Why? During coding all information is in your mind and makes you creative. You discover some tricks and everything is obvious. Some months or years later nothing is obvious anymore. You must follow every step of your program carefully to reconstruct and rerember it - that's reverse engineering.

Now we step further. In most cases you are not the author of the program, but the code is available (GNU Public License, freeware, Open Source). Everyone has his own style (naming variables and functions, using pointers and arrays, ...) and of course it's much more difficult to investigate this type of code. The situation improves if you have a documentation or if the code is well commented. Don't think it's easy, but it's only a matter of time to understand the program.

In most cases the source code will not be made public. Fortunately, there are some disassemblers around, which create the assembler code of any program. To examine this code is really difficult. You need some experience to "translate" the asm-patterns back into high level instructions. It helps a lot to run the program through a debugger and breaking here and there to identify some functions, but chances are great to get lost soon. It helps much, if you understand how programs and algorithms work at asm-level.

Source code Difficulty Tools needed
Written by yourself easy your brain
Written by someone else medium your brain, docs
no source code available hard good disassembler, debugger, ASM-knowledge, some background information (language used, what kind of person wrote the code, ...)

Reverse Engineering Windows Programs

Windows programs are different. First, they are event driven. If something happens (mouse was moved, button was clicked, text was entered, ...) your program will get a message and can handle it or it can be passed to the DefWndProc. That's why the source-code for a normal windows-program will look very strange at the first glance. Long switch-loops decide which action must be performed. But Windows offers unlimited chances for reverse engineering. Every program uses well-know Win32 apifunction, which cover almost every topic you can think off: strings, registry, windows, controls, network,... These functions can be easily identified by a disassembler so you will better understand what the program does: MessageBoxA will show a Messagebox, CreateWindowExA will create a window and so on. The A at the end of the name means that the Ansi-version is called, also available is a Unicode (W - wide char) version. Another advantage is the use of String-resources. Strings are put in the resource-section of the program and they will get an ID. If you want e.g. show messages in a messagebox you would do the following: create a string-resource, write a function which will accept the string-ID, in this function use LoadString to load it and then show it via MessageBox.

Modern programs are written in object-oriented languages like C++. This will make it easier to maintain the code but more difficult to reverse engineer if you only have the asm-code. Classes appear as a big structure (or array) with pointers to the data-members (variables) and pointers to its methods (functions). You will often get pointers to pointers which must be studied carefully.

Debuggers

A debugger allows you to single step through every (asm) instruction of a program. You can check every register and pointer before executing the next instruction. The best debugger and the only one you will ever need is SoftIce (www.numega.com). It runs silently in the background and allows you to set advanced breakpoints. It's possible to break at window-messages (WM_CLOSE, WM_CREATE, ...), api-function (CreateWindowA, CreateFileA, MessageBoxA, ...), memory-access or even when a function is called AND a register has a special value. This will lead you straight to the point of interest. A big problem is that you must have much experience with such a tool. It's very important to get a feeling where something usefull will be done and where you sit in front of a library function which checks the lengh of a string for instance, or you will waste your precious time. In most cases it's better to use a disassembler first.

Disassemblers

A disassembler will convert a compiled exe or dll back into ASM-code. It will create crossreferences within the code (e.g. jumps and calls) and can identify when strings are used. In windows it's also very easy to get the names of api-functions, e.g. "call dword ptr [004040D0]" will be translated to "call User32.CreateWindowExA". The api-functions are well documented so you can identify every parameter in the disassembling.

To give you an idea how it works here is an example C-code of a WinMain-functioned. This was my very first windows-program which can display any file very fast. I only used pure C so the asm-code will be very good to understand.

int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
                    PSTR szCmdLine, int iCmdShow)
     {
     HWND       hwnd ;
     HACCEL     hAccel;
     MSG        msg ;
     WNDCLASSEX wndclass ;
     
     ReadIni(&logfont, &cc, &cf);
     hBrush  = CreateSolidBrush(cc.rgbResult);

     wndclass.cbSize        = sizeof (wndclass) ;
     wndclass.style         = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
     wndclass.lpfnWndProc   = WndProc ;
     wndclass.cbClsExtra    = 0 ;
     wndclass.cbWndExtra    = 0 ;
     wndclass.hInstance     = hInstance ;
     wndclass.hIcon         = LoadIcon (NULL, IDI_APPLICATION) ;
     wndclass.hCursor       = LoadCursor (NULL, IDC_ARROW) ;
     wndclass.hbrBackground = hBrush ;
     wndclass.lpszMenuName  = szAppName ;
     wndclass.lpszClassName = szAppName ;
     wndclass.hIconSm       = LoadIcon (NULL, IDI_APPLICATION) ;

     RegisterClassEx (&wndclass) ;

     hwnd = CreateWindow (szAppName, "Reader",
                          WS_OVERLAPPEDWINDOW | WS_VSCROLL | WS_HSCROLL,
                          CW_USEDEFAULT, CW_USEDEFAULT,
                          CW_USEDEFAULT, CW_USEDEFAULT,
                          NULL, NULL, hInstance, NULL) ;
     
     lstrcpy (szDateiName, szCmdLine);

     if (szDateiName[0]) 
     {
        MaxZeile = PopFile(szDateiName, hwnd);
        GetFileTitle(szDateiName, szDateiName, _MAX_PATH);
     }
     else MaxZeile = 0;

     ShowWindow (hwnd, SW_MAXIMIZE) ;
     UpdateWindow (hwnd) ;
     
     hAccel = LoadAccelerators(hInstance, szAppName);

     while (GetMessage (&msg, NULL, 0, 0))
     {
          if (hDlgModeless == NULL || !IsDialogMessage (hDlgModeless, &msg))
          {
            if (!TranslateAccelerator(hwnd, hAccel, &msg))
            {
                TranslateMessage (&msg) ;
                DispatchMessage (&msg) ;
            }
          }
     }
     
     DeleteObject((HGDIOBJ) hBrush);

     return msg.wParam ;
     }

Two disassemblers are worth mentioning. First of all W32Dasm (newest version is 8.93 I think). This program is one of the best, you will find below its output for the above code. I have cut the part to setup the WNDCLASSEX-structure and start with RegisterClassEx. Remember, Windows uses the C-calling convention to pass parameters to functions. All parameters are PUSHed in reverse order and the called function is responsible for cleaning the stack:

* Referenced by a CALL at Address:
|:00403AE7   

... (much code to register the class)

* Reference To: USER32.RegisterClassExA, Ord:0013h
                                  |
:004010C2 2EFF15AC824000          Call dword ptr cs:[004082AC]

                    ; now follow the parameters for CreateWindowEx
                    ; remember, reverse order!
                    
:004010C9 6A00                    push 00000000                 ; pointer to lParam
:004010CB 8B4514                  mov eax, dword ptr [ebp+14]
:004010CE 50                      push eax                      ; hInstance
:004010CF 6A00                    push 00000000                 ; Menu-handle
:004010D1 6A00                    push 00000000                 ; parent window
:004010D3 6800000080              push 80000000                 ; window-height
:004010D8 6800000080              push 80000000                 ; window-width
:004010DD 6800000080              push 80000000                 ; vertical pos
:004010E2 6800000080              push 80000000                 ; horiz. pos
:004010E7 680000FF00              push 00FF0000                 ; windows-style

* Possible StringData Ref from Data Obj ->"Reader"
                                  |
:004010EC B804904000              mov eax, 00409004
:004010F1 50                      push eax                      ; window-name

* Possible StringData Ref from Data Obj ->"Reader"
                                  |
:004010F2 B8A0964000              mov eax, 004096A0
:004010F7 50                      push eax                      ; window class-name
:004010F8 6A00                    push 00000000                 ; extended styles

* Reference To: USER32.CreateWindowExA, Ord:0003h
                                  |
:004010FA 2EFF156C824000          Call dword ptr cs:[0040826C]  ; Create the window
:00401101 8945F8                  mov dword ptr [ebp-08], eax   ; store handle
:00401104 8B451C                  mov eax, dword ptr [ebp+1C]
:00401107 50                      push eax
:00401108 B850A04000              mov eax, 0040A050
:0040110D 50                      push eax

* Reference To: KERNEL32.lstrcpyA, Ord:0035h
                                  |
:0040110E 2EFF15B8834000          Call dword ptr cs:[004083B8]
:00401115 803D50A0400000          cmp byte ptr [0040A050], 00   ; first char == 0?
:0040111C 742B                    je 00401149
:0040111E 8B55F8                  mov edx, dword ptr [ebp-08]
:00401121 B850A04000              mov eax, 0040A050
:00401126 E852160000              call 0040277D
:0040112B A354A14000              mov dword ptr [0040A154], eax
:00401130 B804010000              mov eax, 00000104
:00401135 50                      push eax                      ; length of buffer
:00401136 B850A04000              mov eax, 0040A050
:0040113B 50                      push eax                      ; pointer
:0040113C B850A04000              mov eax, 0040A050
:00401141 50                      push eax                      ; same pointer

* Reference To: comdlg32.GetFileTitleA, Ord:0004h
                                  |
:00401142 E8F9620000              Call 00407440                  ; get the file-title
:00401147 EB0A                    jmp 00401153

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0040111C(C)
|
:00401149 C70554A1400000000000    mov dword ptr [0040A154], 00000000

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00401147(U)
|

* Possible Ref to Menu: READEREMENU, Item: "Print..."
                                  |
:00401153 6A03                    push 00000003                 ; has nothing to
                                                                ; do with "Print..."
                                                                ; 3 = SW_MAXIMIZE
:00401155 8B45F8                  mov eax, dword ptr [ebp-08]
:00401158 50                      push eax                      ; window-handle

* Reference To: USER32.ShowWindow, Ord:001Bh
                                  |
:00401159 2EFF15CC824000          Call dword ptr cs:[004082CC]  ; show window
:00401160 8B45F8                  mov eax, dword ptr [ebp-08]
:00401163 50                      push eax

* Reference To: USER32.UpdateWindow, Ord:001Fh
                                  |
:00401164 2EFF15DC824000          Call dword ptr cs:[004082DC]  ; update (paint) window

* Possible StringData Ref from Data Obj ->"Reader"
                                  |
:0040116B B8A0964000              mov eax, 004096A0
:00401170 50                      push eax                      ; table name
:00401171 8B4514                  mov eax, dword ptr [ebp+14]
:00401174 50                      push eax                      ; hInstance

* Reference To: USER32.LoadAcceleratorsA, Ord:000Eh
                                  |
:00401175 2EFF1598824000          Call dword ptr cs:[00408298]  ; load accelerators
:0040117C 8945FC                  mov dword ptr [ebp-04], eax

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004011DF(U)                     ; here starts the message loop!
|
:0040117F 6A00                    push 00000000
:00401181 6A00                    push 00000000
:00401183 6A00                    push 00000000
:00401185 8D45D8                  lea eax, dword ptr [ebp-28]
:00401188 50                      push eax

* Reference To: USER32.GetMessageA, Ord:000Bh
                                  |
:00401189 2EFF158C824000          Call dword ptr cs:[0040828C]
:00401190 85C0                    test eax, eax                 ; got 0 (WM_QUIT)?
:00401192 744D                    je 004011E1                   ; then exit the prog
:00401194 833D6CA1400000          cmp dword ptr [0040A16C], 00000000
:0040119B 7415                    je 004011B2                   ; jmp if not modeless
                                                                ; dlg exists
:0040119D 8D45D8                  lea eax, dword ptr [ebp-28]
:004011A0 50                      push eax
:004011A1 FF356CA14000            push dword ptr [0040A16C]

* Reference To: USER32.IsDialogMessageA, Ord:000Dh
                                  |
:004011A7 2EFF1594824000          Call dword ptr cs:[00408294]  ; check for dlg-msg
:004011AE 85C0                    test eax, eax                 ; handled?
:004011B0 752D                    jne 004011DF                  ; yes, jump

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0040119B(C)
|
:004011B2 8D45D8                  lea eax, dword ptr [ebp-28]
:004011B5 50                      push eax
:004011B6 8B45FC                  mov eax, dword ptr [ebp-04]
:004011B9 50                      push eax
:004011BA 8B45F8                  mov eax, dword ptr [ebp-08]
:004011BD 50                      push eax

* Reference To: USER32.TranslateAcceleratorA, Ord:001Dh
                                  |
:004011BE 2EFF15D4824000          Call dword ptr cs:[004082D4]  ; translate key into msg
:004011C5 85C0                    test eax, eax                 ; succeed?
:004011C7 7516                    jne 004011DF                  ; yes, jump
:004011C9 8D45D8                  lea eax, dword ptr [ebp-28]
:004011CC 50                      push eax

* Reference To: USER32.TranslateMessage, Ord:001Eh
                                  |
:004011CD 2EFF15D8824000          Call dword ptr cs:[004082D8]  ; normal msg handling
:004011D4 8D45D8                  lea eax, dword ptr [ebp-28]
:004011D7 50                      push eax

* Reference To: USER32.DispatchMessageA, Ord:0005h
                                  |
:004011D8 2EFF1574824000          Call dword ptr cs:[00408274]

* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004011B0(C), :004011C7(C)
|
:004011DF EB9E                    jmp 0040117F                  ; jmp to start of loop

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00401192(C)
|
:004011E1 FF354CA04000            push dword ptr [0040A04C]     ; Exit prog, cleanup

* Reference To: GDI32.DeleteObject, Ord:0003h
                                  |
:004011E7 2EFF1540824000          Call dword ptr cs:[00408240]
:004011EE 8B45E0                  mov eax, dword ptr [ebp-20]
:004011F1 8945F4                  mov dword ptr [ebp-0C], eax
:004011F4 8B45F4                  mov eax, dword ptr [ebp-0C]
:004011F7 89EC                    mov esp, ebp
:004011F9 5D                      pop ebp
:004011FA 5F                      pop edi
:004011FB 5E                      pop esi
:004011FC 5B                      pop ebx
:004011FD C21000                  ret 0010

As you can see, even without the source you can understand much. W32Dasm gives sometimes misleading hints because if an ID (String, Dialog, Menu) is very low he will mark many useless locations as a reference while actually the number is needed, e.g. if a string "I love this game" has the ID 8 all occurences will get a reference-mark, although the number 8 will be needed for other things too.

The 2nd disassembler which must be mentioned is IDA - Interactive Disassembler Pro. It is much better than W32Dasm because it can identify many library functions. This is especially usefull with modern programs which are written in C++ using the MFC or with Delphy/C++ Builder. It saves you a lot of time browsing through library calls, look at this snippet:

004028FD                             loc_0_4028FD:                 ; CODE XREF: sub_0_402743
004028FD 020 8D 04 5B                    lea eax, [ebx+ebx*2]
00402900 020 8D 4D EC                    lea ecx, [ebp-14h]
00402903 020 FF 34 45 CA D2 4D 00        push dword_0_4DD2CA[eax*2]
0040290A 024 E8 7E FF 08 00              call ?LoadStringA@CString@@QAEHI@Z ; CString::LoadStringA(uint)
0040290F
0040290F                             loc_0_40290F:                 ; CODE XREF: sub_0_402743
0040290F 020 8D 04 5B                                              ; sub_0_402743+1B8j
0040290F                                 lea eax, [ebx+ebx*2]
00402912 020 80 3C 45 C8 D2 4D 00 00     cmp byte_0_4DD2C8[eax*2], 0
0040291A 020 75 4E                       jnz short loc_0_40296A
0040291C 020 8B 86 48 01 00 00           mov eax, [esi+148h]
00402922 020 6A 01                       push 1
00402924 024 6A 02                       push 2
00402926 028 0F B7 44 38 02              movzx eax, word ptr [eax+edi+2]
0040292B 028 50                          push eax
0040292C 02C FF 15 6C B7 4B 00           call ds:MapVirtualKeyA
00402932 024 50                          push eax
00402933 028 8D 4D E4                    lea ecx, [ebp-1Ch]
00402936 028 E8 02 76 08 00              call sub_0_489F3D
0040293B 020 68 40 66 4E 00              push offset unk_0_4E6640
00402940 024 C6 45 FC 03                 mov byte ptr [ebp-4], 3
00402944 024 FF 75 E4                    push dword ptr [ebp-1Ch]
00402947 028 E8 93 69 07 00              call __mbscmp
0040294C 028 59                          pop ecx
0040294D 024 85 C0                       test eax, eax
0040294F 024 59                          pop ecx
00402950 020 74 0C                       jz  short loc_0_40295E

As you can see, standard api functions (MapVirtualKeyA), MFC functions (CString::LoadStringA(uint)) and library functions (__mbscmp) are identified! The interactive in the name of IDA means that the user can give a hint where data or code is stored which will improve the whole process. The program works like a big database, if you name a variable or function, all crossreferences will be updated. In W32Dasm you must use the search-replace feature of your editor for this. Handling of local variables and passed parameters is also much better. The drawback is that IDA isn't easy to use and demands a different working style. But the time you spend learning this great prog will pay out very soon.

You can download a demo version from the IDA Homepage and here is Mammon's great IDA primer (local copy).

Final words

A combined Disassembler-SIce attack will give the best results in reversing programs. First you can try to get some hints in the disassembling, maybe you find some strings or usefull functions. The (virtual) addresses you can use in SIce to verify your results, e.g. you can check the values of the parameters.

One drawback of disassembler is that you won't get the C-code. It would be very nice to know which code is actually a loop (while, for). Some people have done researching on this topic, in 1995 appeared a disassembler which claimed to translate 70% of the asm-code into useful C-Code. Maybe it had some use in the good old days but today most application are written in C++ and I can hardly believe that a program may be able to (re)create C++ classes. A trained reverser can "see" common code fragments but an automated approach is doomed to fail.

 

Home   Reverse Engineering   My essays and progs   Programming   Compression   Encryption   Who am I