Understanding something about why a compiler does this [Archive]

View Full Version : Understanding something about why a compiler does this

Technomancer

May 13th, 2006, 09:39

I am sorry, but i am very new to reverse engineering. When i disassemble software sometimes, i will see something like this often.

* Reference To: VERSION.GetFileVersionInfoSizeA, Ord:0001h
|
:00477462 FF25F4934700 Jmp dword ptr [004793F4]

Or it could be Call dword ptr [xxxxxxxx] too. I understand what it means technically. FF25 is Jmp dword ptr and F4934700 is 004793F4 in little endian format. So this instruction will cause EIP to be set to the dword stored at 004793F4 and will jump there.

I understand this technically, but not the mechanism behind this.

1. Why does the compiler do this? Isn't it kind of long winded?

2. How does this relate to GetFileVersionInfoSize ? So let's say the dword stored at 4793F4 is 00404000. That means this jump will bring you to the address 00404000. But how does that relate to GetFileVersionInfoSize ? Basically i just don't understand how it works and i need to understand what happens from the point you jump to 00404000 onward

and how it relates to the GetFileVersionInfoSize

blabberer

May 13th, 2006, 10:49

the compiler does this because it cannot know where the address of GetVersion would be while it is compiling

so it creates a section called .rdata and fills it with some information
the the windows loader can understand and resolve and provide the correct address

the compiler say hey loader this guy wants to call GetVersion() this GetVersion() is in kernel.dll when you have resolved whats the address of GetVersion() dump the address in this place so that when i am running i can use it

apart from this information it also places a timestamp and some advanced magic information like forwarder chain

and places a jmp table at the end

so call GetVersion() will jump to the jump table which in turn would jump to the resolved address

so its an array of five dwords terminated with a null dword for each dependency thats to be resolved finally terminated with five null dwords to indicate all the stuff is over and is called import table

there are names like originalfirstthunk,firstthunk etc (i would suggest you to get the luvelsmeyer pe.txt
http://spiff.tripnet.se/~iczelion/files/pe1.zip
and give it a thorough reading several times till you understand the mechanisms and thier names)

lets look at raw file and loaded file

raw file (iczelions msgbox.exe)
00000600 5C 20 00 00 00 00 00 00 78 20 00 00 00 00 00 00 \ ......x ......
00000610 4C 20 00 00 00 00 00 00 00 00 00 00 6A 20 00 00 L ..........j ..
00000620 00 20 00 00 54 20 00 00 00 00 00 00 00 00 00 00 . ..T ..........
00000630 86 20 00 00 08 20 00 00 00 00 00 00 00 00 00 00 † .. ..........
00000640 00 00 00 00 00 00 00 00 00 00 00 00 5C 20 00 00 ............\ ..
00000650 00 00 00 00 78 20 00 00 00 00 00 00 75 00 45 78 ....x ......u.Ex
00000660 69 74 50 72 6F 63 65 73 73 00 4B 45 52 4E 45 4C itProcess.KERNEL
00000670 33 32 2E 64 6C 6C 00 00 BB 01 4D 65 73 73 61 67 32.dll..»Messag
00000680 65 42 6F 78 41 00 55 53 45 52 33 32 2E 64 6C 6C eBoxA.USER32.dll
00000690 00 00 ..

loaded file

00402000 >10 A1 3C 83 00 00 00 00 20 A1 3C 83 00 00 00 00 Ў<ѓ.... Ў<ѓ....
00402010 4C 20 00 00 B4 C2 1F 37 00 00 F7 BF 6A 20 00 00 L ..ґВ7..чїj ..
00402020 00 20 00 00 54 20 00 00 CD A1 20 37 00 00 F5 BF . ..T ..НЎ 7..хї
00402030 86 20 00 00 08 20 00 00 00 00 00 00 00 00 00 00 † .. ..........
00402040 00 00 00 00 00 00 00 00 00 00 00 00 5C 20 00 00 ............\ ..
00402050 00 00 00 00 78 20 00 00 00 00 00 00 75 00 45 78 ....x ......u.Ex
00402060 69 74 50 72 6F 63 65 73 73 00 4B 45 52 4E 45 4C itProcess.KERNEL
00402070 33 32 2E 64 6C 6C 00 00 BB 01 4D 65 73 73 61 67 32.dll..»Messag
00402080 65 42 6F 78 41 00 55 53 45 52 33 32 2E 64 6C 6C eBoxA.USER32.dll
00402090 00 00 ..
[/B]

you can see the loader has resolved the 0x205c and substituted with 0x80##### (i am on 9x if you are on nt or > you will have 0x7#### there)

naides

May 13th, 2006, 11:08

Welcome to RCE.

The short answer would be to ask you to search the PE file structure and the functioning of the IT import table and IAT.

But I am going to give you a little heads up:

VERSION.GetFileVersionInfoSizeA, Ord:0001h is the NAME of an imported function code from some dll.

Your application needs to Call this code, but it cannot tell in advance where that GetFileVersionInfoSizeA address is in memory. In fact, depending on the OS or the version of the DLL, that address may change at different computers and perhaps everytime run your program.

So a construct like

:00477462 Jmp 7432103

or

:00477462 Call 7432103

will only work in the unlikely circumstance that GetFileVersionInfoSizeA is always loaded and located at the static address 7432103.

Enter the OS loader

When your Application loads and loads its dll, the loader learns where GetFileVersionInfoSizeA (and all other imported functions) is located. Let us assume this time it happens to be 7432103.

The Loader scans the import table of your file and fills that address
at a special table in your file called the IAT,
at a particular locus, in this example [004793F4] where your application expects to find the address of the function GetFileVersionInfoSizeA

Then whenever your app wants to call it

it runs

00477462 FF25F4934700 Jmp dword ptr [004793F4]

or

00477462 FF25F4934700 call dword ptr [004793F4]

which will always take you to the right destination because the memory at

004793F4 contains the value 7432103. and if it changes next time, the correct address will be found there.

That is the innerworkings of dynamic linkings, with one or two more convolutions, but you get the idea eh??

disavowed

May 13th, 2006, 15:11

Technomancer, see http://spiff.tripnet.se/~iczelion/pe-tut6.html ("http://spiff.tripnet.se/~iczelion/pe-tut6.html") for more info.

Maximus

May 13th, 2006, 16:13

You might read hxxp://www.codebreakers-journal.com/viewarticle.php?id=74&layout=abstract .

Technomancer

May 13th, 2006, 19:55

Thanks alot guys

Just some more points to clarify. I am using WinXP so it should be 7XXXXX

Let's say 00477462 FF25F4934700 Jmp dword ptr [004793F4]

004793F4 will contain 7XXXXXXX so basically, we will be jumping to that. What i don't understand is,

blabberer: What technically is a jump table and where is the jump table? Is the 7XXXXXX address part of a jump table? Also, as you stated, it will place a time stamp. So the timestamp is part of the "array of five dwords terminated with a null dword for each dependency thats to be resolved finally terminated with five null dwords ", which is the IAT right? How does the jump table relate to this though ?

Maximus: I think that link is broken.

LLXX

May 13th, 2006, 19:58

Also, the reason why all imported calls go through the import table, instead of the loader modifying every jump/call instruction (which would achieve the same effect), is because of efficiency. Only one dord needs to be modified for each imported function, instead of many jumps/calls throughout the file.

For API-hooking purpose this is also useful to know

LLXX

May 13th, 2006, 20:03

Posted at the same time as me...

Quote:

[Originally Posted by Technomancer]What technically is a jump table and where is the jump table? Is the 7XXXXXX address part of a jump table? Also, as you stated, it will place a time stamp. So the timestamp is part of the "array of five dwords terminated with a null dword for each dependency thats to be resolved finally terminated with five null dwords ", which is the IAT right? How does the jump table relate to this though ?

A jump table is a table of JMP instructions pointing to the import slots, e.g.

Code:

JMP [00401600]

JMP [00401604]

JMP [00401608]

JMP [0040160C]

...

Which many assemblers and compilers append to the file, even though it would be a lot more efficient to call the import slot directly.

The 7XXXXXXX address is where the actual API function code resides.

naides

May 13th, 2006, 21:29

Another point:
addresses 7XXXXXXX correspond to the system dlls.
your application may import functions form other app specific dlls, which will be found at much lower addresses

blabberer

May 14th, 2006, 00:52

whats a jmp table ?

if you scroll down the listing in ollydbg you will notice rightly some thing like what you saw 0xff25

00401132 $-FF25 14204000 JMP DWORD PTR DS:[<&USER32.DialogBoxPara>
00401138 $-FF25 10204000 JMP DWORD PTR DS:[<&USER32.EndDialog>]
0040113E $-FF25 20204000 JMP DWORD PTR DS:[<&USER32.GetDlgItem>]
00401144 $-FF25 1C204000 JMP DWORD PTR DS:[<&USER32.GetDlgItemTex>
0040114A $-FF25 0C204000 JMP DWORD PTR DS:[<&USER32.MessageBoxA>]
00401150 $-FF25 24204000 JMP DWORD PTR DS:[<&USER32.SendMessageA>>
00401156 $-FF25 28204000 JMP DWORD PTR DS:[<&USER32.SetDlgItemTex>
0040115C $-FF25 18204000 JMP DWORD PTR DS:[<&USER32.SetFocus>]
00401162 .-FF25 04204000 JMP DWORD PTR DS:[<&KERNEL32.ExitProcess>
00401168 $-FF25 00204000 JMP DWORD PTR DS:[<&KERNEL32.GetModuleHa>

or same thing in ollydbg without having it display the symbolic names

00401132 $-FF25 14204000 JMP DWORD PTR DS:[402014]
00401138 $-FF25 10204000 JMP DWORD PTR DS:[402010]
0040113E $-FF25 20204000 JMP DWORD PTR DS:[402020]
00401144 $-FF25 1C204000 JMP DWORD PTR DS:[40201C]
0040114A $-FF25 0C204000 JMP DWORD PTR DS:[40200C]
00401150 $-FF25 24204000 JMP DWORD PTR DS:[402024]
00401156 $-FF25 28204000 JMP DWORD PTR DS:[402028]
0040115C $-FF25 18204000 JMP DWORD PTR DS:[402018]
00401162 .-FF25 04204000 JMP DWORD PTR DS:[402004]
00401168 $-FF25 00204000 JMP DWORD PTR DS:[402000]

thats the jump table it was put by the compiler/assembler
one does not code it

one just says MessageBox(NULL,"blah",Blah",NULL);
and compiler will put a ff25 thingie at the end

if you ask ollydbg to resove ip for any registers

you can see

DS:[00402014]=898A7130, (Thunk to USER32.DialogBoxParamA)
Local call from <ModuleEntryPoint>+20

so this jump table is being called from ModuleEntryPoint+0x20

double clicking on the addresss tab would get you a relative referancing mode in ollydg

$+20 00401020 |. E8 0D010000 CALL 00401132 ; \DialogBoxParamA

whats in 402014 after being loaded ?

[CODE]
00402014 >30 71 8A 89 0qŠ‰

what was there when it was in raw mode

00000610 9C 20 00 00 œ ..

what did 209c originally point to which the loader used to resolve

00000690 92 00 44 69 ’.Di
000006A0 61 6C 6F 67 42 6F 78 50 61 72 61 6D 41 00 alogBoxParamA.

in which dll this Dialog whatever was there

00000630 16 21 00 00 !..

what did the pointer 2116 (716 in physical address) point to

00000710 55 53 45 52 33 32 2E 64 6C 6C USER32.dll
00000720 00 00 ..

and so on

the above examples are based on iczelions tut-10-2 (dialogbox.exe)

its absolutely simple once you grasp the basics

homersux

May 16th, 2006, 19:05

Quote:

[Originally Posted by Technomancer]Thanks alot guys

Just some more points to clarify. I am using WinXP so it should be 7XXXXX

Let's say 00477462 FF25F4934700 Jmp dword ptr [004793F4]

004793F4 will contain 7XXXXXXX so basically, we will be jumping to that. What i don't understand is,

blabberer: What technically is a jump table and where is the jump table? Is the 7XXXXXX address part of a jump table? Also, as you stated, it will place a time stamp. So the timestamp is part of the "array of five dwords terminated with a null dword for each dependency thats to be resolved finally terminated with five null dwords ", which is the IAT right? How does the jump table relate to this though ?

Maximus: I think that link is broken.

A jump table introduces a level of indirection to solve the problem of dynamically loading functions in win32 environment. Indirection is a common idiom in computer science. It can be used to resolve problems that may seem impossible to tackle at a first glance. For example, here the heart of the problem is that the function GetFileVersion's virtual address cannot be known at compile time. The compiler simply defers the problem by introducing a data structure called jump table and allows the program loader to fill in the proper values of GetFileVersion etc's virtual address to be resolved at run time. As LLXX pointed out, this data structure is an ingenious way to engineering this problem. There are other solutions but the jump table one is pretty neat.

Admiral

May 17th, 2006, 18:53

homersux,

Your explanation is fine, but I'm still not happy. Here's what I understand:

The code section is littered with calls to various API functions, the addresses of which are not known at compile time. Hence the calls are directed to a table which can be filled at run-time to solve everybody's problems. That's great.
Now, the IAT (which is a null-separated, double-null-terminated array of DWORDs) is one example of the solution-by-indirection in that it is filled at run-time with the addresses of functions.
Also, the 'jump table' (which is a table similar to the IAT, containing a series of JMP DWORD (PTR)s to all the APIs instanced) can solve the problem in much the same way.
What I don't understand though is why both are necessary. The problem calls for one level of indirection only. Using two seems a little stupid. Maybe I'm missing something.

Regards
Admiral

LLXX

May 17th, 2006, 22:11

It must be due to some constraint on the compiler. Many of the programs I've inspected do not use a jump table (i.e. call directly into IAT), while an almost equally large amount do (the API function is called with a CALL xxxxxxxx, which then proceeds to a JMP [xxxxxxxx] through the IAT). Perhaps the compiler finds it easier to generate JMP [xxxxxxxx] and call through that than to generate a "hypothetical" import table and either enforce that it be in the same position after linkage, or try to generate relocations for the imports (which might be an impossible task).

Perhaps someone would experiment with different compilers and compile options to see whether or not a jump table is produced? I know for sure that MASM + LINK will generate a jump table, to the dismay of Asm programmers like me that find it a waste of space.

homersux

May 18th, 2006, 19:10

You don't have to use a jump table. One level of indirection is fine. As long as you understand the reasoning behind this hassle. Details can be engineered.

Pyrae

May 19th, 2006, 01:24

Quote:

I know for sure that MASM + LINK will generate a jump table, to the dismay of Asm programmers like me that find it a waste of space.

Actually, one reason to use JMP tables for calling imported functions is to SAVE space: a relative call (x86 opcode E8h) to a JMP table entry is 5 bytes while a memory indirected absolute call into the IAT (x86 opcode FF15h) would require 6 bytes of code.
Of course, this only takes effect if an import is called from more than six different locations within the actual code.

blabberer

May 19th, 2006, 05:39

well elicz has some wizardry like kayaker would say that doesnt use jump table in his eliasm may be you could check that out to see how he is
assembling without jump table

Quote:

Q: How can I make my assembly language more flexible?
A: Can you do the following things?
CALL [API] ;FF15 form of CALL instead of E8 form with JMP table at the end of .code section
MOV EAX, [API]
MOV [API], EBX
Yes You can tell me: Call GetModuleHandle and GetProcAddress and It's done
Then I tell you: And when I need address of native API (e.g. KeTickCount, KeServiceDescriptorTable, ...) ?
Then You tell me: Write your program in Visual C++.
I found how to do them in Feb-6-1999. From this day you can see in my .EXEs both forms of calls.
One E8 call requires 5+6 bytes + 1 reloc item. Six E8 calls from various places of code require 6*5+6 bytes + 1 reloc item.
One FF15 call requires 6 bytes + 1 reloc item. Six FF15 calls from various places of code require 6*6 + 6 reloc items.
I will publish here the technology EliASM after I receive 1st email telling me how to play with APIs in assembly language.
So You have task! Please quickly!
Pedro was the 2nd who found the way. Applauses!

http://www.anticracking.sk/EliCZ/infos.htm