Introduction Windows Programs Debuggers Disassemblers Final words
Imagine, that you have written a programm. You were lazy and added only few comments. Now you have a quick look and try to understand your old code - and fail. Why? During coding all information is in your mind and makes you creative. You discover some tricks and everything is obvious. Some months or years later nothing is obvious anymore. You must follow every step of your program carefully to reconstruct and rerember it - that's reverse engineering.
Now we step further. In most cases you are not the author of the program, but the code is available (GNU Public License, freeware, Open Source). Everyone has his own style (naming variables and functions, using pointers and arrays, ...) and of course it's much more difficult to investigate this type of code. The situation improves if you have a documentation or if the code is well commented. Don't think it's easy, but it's only a matter of time to understand the program.
In most cases the source code will not be made public. Fortunately, there are some disassemblers around, which create the assembler code of any program. To examine this code is really difficult. You need some experience to "translate" the asm-patterns back into high level instructions. It helps a lot to run the program through a debugger and breaking here and there to identify some functions, but chances are great to get lost soon. It helps much, if you understand how programs and algorithms work at asm-level.
| Source code | Difficulty | Tools needed |
|---|---|---|
| Written by yourself | easy | your brain |
| Written by someone else | medium | your brain, docs |
| no source code available | hard | good disassembler, debugger, ASM-knowledge, some background information (language used, what kind of person wrote the code, ...) |
Windows programs are different. First, they are event driven. If something happens (mouse was moved, button was clicked, text was entered, ...) your program will get a message and can handle it or it can be passed to the DefWndProc. That's why the source-code for a normal windows-program will look very strange at the first glance. Long switch-loops decide which action must be performed. But Windows offers unlimited chances for reverse engineering. Every program uses well-know Win32 apifunction, which cover almost every topic you can think off: strings, registry, windows, controls, network,... These functions can be easily identified by a disassembler so you will better understand what the program does: MessageBoxA will show a Messagebox, CreateWindowExA will create a window and so on. The A at the end of the name means that the Ansi-version is called, also available is a Unicode (W - wide char) version. Another advantage is the use of String-resources. Strings are put in the resource-section of the program and they will get an ID. If you want e.g. show messages in a messagebox you would do the following: create a string-resource, write a function which will accept the string-ID, in this function use LoadString to load it and then show it via MessageBox.
Modern programs are written in object-oriented languages like C++. This will make it easier to maintain the code but more difficult to reverse engineer if you only have the asm-code. Classes appear as a big structure (or array) with pointers to the data-members (variables) and pointers to its methods (functions). You will often get pointers to pointers which must be studied carefully.
Debuggers
A debugger allows you to single step through every (asm) instruction of a program. You can check every register and pointer before executing the next instruction. The best debugger and the only one you will ever need is SoftIce (www.numega.com). It runs silently in the background and allows you to set advanced breakpoints. It's possible to break at window-messages (WM_CLOSE, WM_CREATE, ...), api-function (CreateWindowA, CreateFileA, MessageBoxA, ...), memory-access or even when a function is called AND a register has a special value. This will lead you straight to the point of interest. A big problem is that you must have much experience with such a tool. It's very important to get a feeling where something usefull will be done and where you sit in front of a library function which checks the lengh of a string for instance, or you will waste your precious time. In most cases it's better to use a disassembler first.
Disassemblers
A disassembler will convert a compiled exe or dll back into ASM-code. It will create crossreferences within the code (e.g. jumps and calls) and can identify when strings are used. In windows it's also very easy to get the names of api-functions, e.g. "call dword ptr [004040D0]" will be translated to "call User32.CreateWindowExA". The api-functions are well documented so you can identify every parameter in the disassembling.
To give you an idea how it works here is an example C-code of a WinMain-functioned. This was my very first windows-program which can display any file very fast. I only used pure C so the asm-code will be very good to understand.
int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
PSTR szCmdLine, int iCmdShow)
{
HWND hwnd ;
HACCEL hAccel;
MSG msg ;
WNDCLASSEX wndclass ;
ReadIni(&logfont, &cc, &cf);
hBrush = CreateSolidBrush(cc.rgbResult);
wndclass.cbSize = sizeof (wndclass) ;
wndclass.style = CS_HREDRAW | CS_VREDRAW | CS_OWNDC;
wndclass.lpfnWndProc = WndProc ;
wndclass.cbClsExtra = 0 ;
wndclass.cbWndExtra = 0 ;
wndclass.hInstance = hInstance ;
wndclass.hIcon = LoadIcon (NULL, IDI_APPLICATION) ;
wndclass.hCursor = LoadCursor (NULL, IDC_ARROW) ;
wndclass.hbrBackground = hBrush ;
wndclass.lpszMenuName = szAppName ;
wndclass.lpszClassName = szAppName ;
wndclass.hIconSm = LoadIcon (NULL, IDI_APPLICATION) ;
RegisterClassEx (&wndclass) ;
hwnd = CreateWindow (szAppName, "Reader",
WS_OVERLAPPEDWINDOW | WS_VSCROLL | WS_HSCROLL,
CW_USEDEFAULT, CW_USEDEFAULT,
CW_USEDEFAULT, CW_USEDEFAULT,
NULL, NULL, hInstance, NULL) ;
lstrcpy (szDateiName, szCmdLine);
if (szDateiName[0])
{
MaxZeile = PopFile(szDateiName, hwnd);
GetFileTitle(szDateiName, szDateiName, _MAX_PATH);
}
else MaxZeile = 0;
ShowWindow (hwnd, SW_MAXIMIZE) ;
UpdateWindow (hwnd) ;
hAccel = LoadAccelerators(hInstance, szAppName);
while (GetMessage (&msg, NULL, 0, 0))
{
if (hDlgModeless == NULL || !IsDialogMessage (hDlgModeless, &msg))
{
if (!TranslateAccelerator(hwnd, hAccel, &msg))
{
TranslateMessage (&msg) ;
DispatchMessage (&msg) ;
}
}
}
DeleteObject((HGDIOBJ) hBrush);
return msg.wParam ;
}
Two disassemblers are worth mentioning. First of all W32Dasm (newest version is 8.93 I think). This program is one of the best, you will find below its output for the above code. I have cut the part to setup the WNDCLASSEX-structure and start with RegisterClassEx. Remember, Windows uses the C-calling convention to pass parameters to functions. All parameters are PUSHed in reverse order and the called function is responsible for cleaning the stack:
* Referenced by a CALL at Address:
|:00403AE7
... (much code to register the class)
* Reference To: USER32.RegisterClassExA, Ord:0013h
|
:004010C2 2EFF15AC824000 Call dword ptr cs:[004082AC]
; now follow the parameters for CreateWindowEx
; remember, reverse order!
:004010C9 6A00 push 00000000 ; pointer to lParam
:004010CB 8B4514 mov eax, dword ptr [ebp+14]
:004010CE 50 push eax ; hInstance
:004010CF 6A00 push 00000000 ; Menu-handle
:004010D1 6A00 push 00000000 ; parent window
:004010D3 6800000080 push 80000000 ; window-height
:004010D8 6800000080 push 80000000 ; window-width
:004010DD 6800000080 push 80000000 ; vertical pos
:004010E2 6800000080 push 80000000 ; horiz. pos
:004010E7 680000FF00 push 00FF0000 ; windows-style
* Possible StringData Ref from Data Obj ->"Reader"
|
:004010EC B804904000 mov eax, 00409004
:004010F1 50 push eax ; window-name
* Possible StringData Ref from Data Obj ->"Reader"
|
:004010F2 B8A0964000 mov eax, 004096A0
:004010F7 50 push eax ; window class-name
:004010F8 6A00 push 00000000 ; extended styles
* Reference To: USER32.CreateWindowExA, Ord:0003h
|
:004010FA 2EFF156C824000 Call dword ptr cs:[0040826C] ; Create the window
:00401101 8945F8 mov dword ptr [ebp-08], eax ; store handle
:00401104 8B451C mov eax, dword ptr [ebp+1C]
:00401107 50 push eax
:00401108 B850A04000 mov eax, 0040A050
:0040110D 50 push eax
* Reference To: KERNEL32.lstrcpyA, Ord:0035h
|
:0040110E 2EFF15B8834000 Call dword ptr cs:[004083B8]
:00401115 803D50A0400000 cmp byte ptr [0040A050], 00 ; first char == 0?
:0040111C 742B je 00401149
:0040111E 8B55F8 mov edx, dword ptr [ebp-08]
:00401121 B850A04000 mov eax, 0040A050
:00401126 E852160000 call 0040277D
:0040112B A354A14000 mov dword ptr [0040A154], eax
:00401130 B804010000 mov eax, 00000104
:00401135 50 push eax ; length of buffer
:00401136 B850A04000 mov eax, 0040A050
:0040113B 50 push eax ; pointer
:0040113C B850A04000 mov eax, 0040A050
:00401141 50 push eax ; same pointer
* Reference To: comdlg32.GetFileTitleA, Ord:0004h
|
:00401142 E8F9620000 Call 00407440 ; get the file-title
:00401147 EB0A jmp 00401153
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0040111C(C)
|
:00401149 C70554A1400000000000 mov dword ptr [0040A154], 00000000
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00401147(U)
|
* Possible Ref to Menu: READEREMENU, Item: "Print..."
|
:00401153 6A03 push 00000003 ; has nothing to
; do with "Print..."
; 3 = SW_MAXIMIZE
:00401155 8B45F8 mov eax, dword ptr [ebp-08]
:00401158 50 push eax ; window-handle
* Reference To: USER32.ShowWindow, Ord:001Bh
|
:00401159 2EFF15CC824000 Call dword ptr cs:[004082CC] ; show window
:00401160 8B45F8 mov eax, dword ptr [ebp-08]
:00401163 50 push eax
* Reference To: USER32.UpdateWindow, Ord:001Fh
|
:00401164 2EFF15DC824000 Call dword ptr cs:[004082DC] ; update (paint) window
* Possible StringData Ref from Data Obj ->"Reader"
|
:0040116B B8A0964000 mov eax, 004096A0
:00401170 50 push eax ; table name
:00401171 8B4514 mov eax, dword ptr [ebp+14]
:00401174 50 push eax ; hInstance
* Reference To: USER32.LoadAcceleratorsA, Ord:000Eh
|
:00401175 2EFF1598824000 Call dword ptr cs:[00408298] ; load accelerators
:0040117C 8945FC mov dword ptr [ebp-04], eax
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004011DF(U) ; here starts the message loop!
|
:0040117F 6A00 push 00000000
:00401181 6A00 push 00000000
:00401183 6A00 push 00000000
:00401185 8D45D8 lea eax, dword ptr [ebp-28]
:00401188 50 push eax
* Reference To: USER32.GetMessageA, Ord:000Bh
|
:00401189 2EFF158C824000 Call dword ptr cs:[0040828C]
:00401190 85C0 test eax, eax ; got 0 (WM_QUIT)?
:00401192 744D je 004011E1 ; then exit the prog
:00401194 833D6CA1400000 cmp dword ptr [0040A16C], 00000000
:0040119B 7415 je 004011B2 ; jmp if not modeless
; dlg exists
:0040119D 8D45D8 lea eax, dword ptr [ebp-28]
:004011A0 50 push eax
:004011A1 FF356CA14000 push dword ptr [0040A16C]
* Reference To: USER32.IsDialogMessageA, Ord:000Dh
|
:004011A7 2EFF1594824000 Call dword ptr cs:[00408294] ; check for dlg-msg
:004011AE 85C0 test eax, eax ; handled?
:004011B0 752D jne 004011DF ; yes, jump
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:0040119B(C)
|
:004011B2 8D45D8 lea eax, dword ptr [ebp-28]
:004011B5 50 push eax
:004011B6 8B45FC mov eax, dword ptr [ebp-04]
:004011B9 50 push eax
:004011BA 8B45F8 mov eax, dword ptr [ebp-08]
:004011BD 50 push eax
* Reference To: USER32.TranslateAcceleratorA, Ord:001Dh
|
:004011BE 2EFF15D4824000 Call dword ptr cs:[004082D4] ; translate key into msg
:004011C5 85C0 test eax, eax ; succeed?
:004011C7 7516 jne 004011DF ; yes, jump
:004011C9 8D45D8 lea eax, dword ptr [ebp-28]
:004011CC 50 push eax
* Reference To: USER32.TranslateMessage, Ord:001Eh
|
:004011CD 2EFF15D8824000 Call dword ptr cs:[004082D8] ; normal msg handling
:004011D4 8D45D8 lea eax, dword ptr [ebp-28]
:004011D7 50 push eax
* Reference To: USER32.DispatchMessageA, Ord:0005h
|
:004011D8 2EFF1574824000 Call dword ptr cs:[00408274]
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004011B0(C), :004011C7(C)
|
:004011DF EB9E jmp 0040117F ; jmp to start of loop
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:00401192(C)
|
:004011E1 FF354CA04000 push dword ptr [0040A04C] ; Exit prog, cleanup
* Reference To: GDI32.DeleteObject, Ord:0003h
|
:004011E7 2EFF1540824000 Call dword ptr cs:[00408240]
:004011EE 8B45E0 mov eax, dword ptr [ebp-20]
:004011F1 8945F4 mov dword ptr [ebp-0C], eax
:004011F4 8B45F4 mov eax, dword ptr [ebp-0C]
:004011F7 89EC mov esp, ebp
:004011F9 5D pop ebp
:004011FA 5F pop edi
:004011FB 5E pop esi
:004011FC 5B pop ebx
:004011FD C21000 ret 0010
As you can see, even without the source you can understand much. W32Dasm gives sometimes misleading hints because if an ID (String, Dialog, Menu) is very low he will mark many useless locations as a reference while actually the number is needed, e.g. if a string "I love this game" has the ID 8 all occurences will get a reference-mark, although the number 8 will be needed for other things too.
The 2nd disassembler which must be mentioned is IDA - Interactive Disassembler Pro. It is much better than W32Dasm because it can identify many library functions. This is especially usefull with modern programs which are written in C++ using the MFC or with Delphy/C++ Builder. It saves you a lot of time browsing through library calls, look at this snippet:
004028FD loc_0_4028FD: ; CODE XREF: sub_0_402743 004028FD 020 8D 04 5B lea eax, [ebx+ebx*2] 00402900 020 8D 4D EC lea ecx, [ebp-14h] 00402903 020 FF 34 45 CA D2 4D 00 push dword_0_4DD2CA[eax*2] 0040290A 024 E8 7E FF 08 00 call ?LoadStringA@CString@@QAEHI@Z ; CString::LoadStringA(uint) 0040290F 0040290F loc_0_40290F: ; CODE XREF: sub_0_402743 0040290F 020 8D 04 5B ; sub_0_402743+1B8j 0040290F lea eax, [ebx+ebx*2] 00402912 020 80 3C 45 C8 D2 4D 00 00 cmp byte_0_4DD2C8[eax*2], 0 0040291A 020 75 4E jnz short loc_0_40296A 0040291C 020 8B 86 48 01 00 00 mov eax, [esi+148h] 00402922 020 6A 01 push 1 00402924 024 6A 02 push 2 00402926 028 0F B7 44 38 02 movzx eax, word ptr [eax+edi+2] 0040292B 028 50 push eax 0040292C 02C FF 15 6C B7 4B 00 call ds:MapVirtualKeyA 00402932 024 50 push eax 00402933 028 8D 4D E4 lea ecx, [ebp-1Ch] 00402936 028 E8 02 76 08 00 call sub_0_489F3D 0040293B 020 68 40 66 4E 00 push offset unk_0_4E6640 00402940 024 C6 45 FC 03 mov byte ptr [ebp-4], 3 00402944 024 FF 75 E4 push dword ptr [ebp-1Ch] 00402947 028 E8 93 69 07 00 call __mbscmp 0040294C 028 59 pop ecx 0040294D 024 85 C0 test eax, eax 0040294F 024 59 pop ecx 00402950 020 74 0C jz short loc_0_40295E
As you can see, standard api functions (MapVirtualKeyA), MFC functions (CString::LoadStringA(uint)) and library functions (__mbscmp) are identified! The interactive in the name of IDA means that the user can give a hint where data or code is stored which will improve the whole process. The program works like a big database, if you name a variable or function, all crossreferences will be updated. In W32Dasm you must use the search-replace feature of your editor for this. Handling of local variables and passed parameters is also much better. The drawback is that IDA isn't easy to use and demands a different working style. But the time you spend learning this great prog will pay out very soon.
You can download a demo version from the IDA Homepage and here is Mammon's great IDA primer (local copy).
Final words
A combined Disassembler-SIce attack will give the best results in reversing programs. First you can try to get some hints in the disassembling, maybe you find some strings or usefull functions. The (virtual) addresses you can use in SIce to verify your results, e.g. you can check the values of the parameters.
One drawback of disassembler is that you won't get the C-code. It would be very nice to know which code is actually a loop (while, for). Some people have done researching on this topic, in 1995 appeared a disassembler which claimed to translate 70% of the asm-code into useful C-Code. Maybe it had some use in the good old days but today most application are written in C++ and I can hardly believe that a program may be able to (re)create C++ classes. A trained reverser can "see" common code fragments but an automated approach is doomed to fail.
Home Reverse Engineering My essays and progs Programming Compression Encryption Who am I