| Last Revision: | July 16th 1999 |
| Email: | muproject@hotmail.com |
| Latest versions available from: |
http://members.tripod.com/~MUProject/Latest.zip |
| Reasons for FAQ
[Miz] It will also serve as an introduction to the much more detailed
unpacking project which will be created shortly.
It came about after several requests for information on +Sandman's
Newbie messageboard. The topic is so broad the posts were often
fragmented. As a result, a suggestion was made to centralize many of the
questions people had, and to create a project that will offer real world
examples. Hopefully it will help to de-mystify this area of reversing, as
well as stimulate some interest in associated areas such as API hooking,
decompilation, virii, anti-anti-Sice measures, encryption, etc.
This 'open' FAQ will be updated as frequently as possible and readers
are actively encouraged to submit queries and/or contribute
answers/ideas/knowledge. | ||||
| How can we ask questions
and/or contribute to this project?
Email: muproject@hotmail.com | ||||
| How do we know when this
FAQ/Project has been updated?
[Miz] | ||||
| I have completed 'PartX'
of the mini-projects where do I post the tutorial?
[Miz] This is to stop the mini-projects becoming a race and spoiling things
for others.
Please read the 'Intro.txt' for more information.
Thanks. | ||||
| Why would we want to
unpack an executable? [Miz] There are several reasons, but IMO the main one is so that you remain informed about exactly what a specific program is doing. Deliberately hiding a programs code and hiding which DLL's and API functions it uses should set off warning lights with people concerned with even basic privacy and security. In a Win9x environment any program has access to anything on your machine. Anything. I have no problem with the actual compression programs themselves, they provide a useful service and are excellent study material in their own right. My problem is with programs that can potentially abuse the side-effects of executable compression to hinder a target's examination and (if necessary) reversing. 'Baloney, you just want to patch the pants off it!' - purely a side effect, honest. Anyway, if a program's author has nothing to hide then he surely would not mind us checking for ourselves ;) By unpacking an executable down to the point that we can remove all traces of the original packer, repairing all the relevant PE data, we end up being able to dead-list code, use API monitors, make patches etc.etc. We unpack so we can see. | ||||
| I've seen unpack
utilities for BrandX, why can't I simply use them?
[Miz] On top of that, it is (IMO) a very interesting area for study, and
touches on many other interesting areas. I've seen some hostile
discussions on the importance of understanding/ignoring the PE format.
Maybe studying such things as manual unpacking, virri, API hooking, etc,
may make people understand it's relevance more, and give some more insight
into what the system is up to in the background. | ||||
| What sort of utilities are
available to pack/encrypt a program?
[Miz] Programs such as Petite, Neolite, Shrinker, ASPack, etc, are widely
used schemes, having been around for some time and consequently being more
refined than others. Check out those programs. Many have evaluation
periods. Reading their documentation gives a good background to the topic.
Also check out the more basic ones from tools sites. Sometimes you find
gems, like ones that come with source code or a 'how it was done' doc.
| ||||
| I'm experienced at
reversing, but this area is relatively new to me. Where's a good place to start? [Miz] Read, Read, then Re-read; particularly documentation like MattP's.
Then, get hold of a small, simple executable - one you are familiar with.
Pack it with an uncomplicated packer (suggest ASPack) and make it your aim
to reverse it to being as close a copy of the original as you possibly
can. Make it an ongoing project while you continue to search and read.
If you want to dive in straight away then check out Torn@do's packed
crackme and its accompanying unpacking doc (included in the
'Library/OldStuff' directory). It skirts around the huge 'import' problem
(that will be covered in much more detail as part of the main project),
but it may be of some help. Please bear in mind the 'doc' was never
intended to be a tutorial as such. Much more information will be provided
here.
Alternatively, (and the way I learnt), after reading as much info as
your brain can handle, write your own PE 'modifiers'. My first attempt was
a modifier that reversed the order of all the bytes in the code section
and swapped them back at runtime. Pointless, I know ;) But it didn't
involve delving in the PE format that much, and its surprising how much
such a trivial task can teach you. From there you can move to more
involved tasks like packing the code section. From there packing the
import section. etc.etc.
Sounds scary? Not with the resources available. I will keep referring
to the excellent work that Stone did long ago in this area. I'm sure
others have contributed equally as much, but for me Stone's work was a
revelation. There's a simple pe-encryptor with source and docs on his site
that can work as an great reference. | ||||
| Blimey! All I want to do
is patch a single byte in the uncompressed exe, do I really have to do all
this?
[Miz] Leave the executable packed but divert the final 'jump' (to the
original entry point) to some of your own code that can then apply the
patch to the now unpacked code/data. For you to be able to do this
properly however, you will need some knowledge of caving or section
construction; you'll need to write relocatable code that references the
patch address in a OS friendly way (i.e. uses the ACTUAL base address, not
the PREFERRED base address), etc.
This project, FAQ, and the references in Further
Reading will give you the knowledge to do these things, and hopefully
much more besides. | ||||
| Is packing a program
like zipping a proggy? [Jeff]
[Miz] | ||||
| Is packing that simple
and quick? [Jeff]
[Miz] | ||||
| What is the purpose
behind packing up a proggy? [Jeff]
[Miz] Allegedly, just to make it smaller -
40-60% of original size is about normal. | ||||
| Who would use such a
packed proggy and why? [Jeff]
[Miz] | ||||
| Are the changes to the
proggy always the same? [Jeff]
[Miz] Again the OS dictates some limitations, but other than that a
packer/virus/encryptor can be as creative as its creator ;) It is because
of this that this FAQ and the resulting full project will try to avoid
direct descriptions of particular versions of packers. Instead we hope it
will provide enough background knowledge for people to be able to cope
with new 'generations' of packers as they emerge. | ||||
| HOW do we KNOW when we dl
a proggy that it is packed? [Jeff]
[Miz] NOTE: Exe checkers will only know about packers they have been
programmed to know about. They normally just check for signatures (byte
strings) in the same way that virus checkers do. If you rely on an exe
checker and it says a program is not packed then be sure to check it
manually. It may be a latest revision of an existing packer which is not
recognised yet. | ||||
| What, if any, tell tale
signs tell us this proggy may be packed? [Jeff]
[Miz] | ||||
| Does something happen in
SoftIce or Win32Dasm to give us a clue? [Jeff]
[Miz] | ||||
| What tools would we need
in our unpacking arsenal? [Jeff]
[Miz] Check any decent tools site and you will find dozens of utils for
studying and modifying pe-files. Take a look at as many as you can,
sometimes they even include source. Anything you can get your hands on may
be of help.
Also, (it should go without saying), try writing your own. Check out Stone's site for some great starting
points. The pe-encryptor he presents is a great foundation to this topic.
Start with simple things and gradually work your way up. | ||||
| Packers/encryptors/Virri,
etc, seem to be mentioned together quite often. Are they so similar? [Miz] Like a virus, unpackers often have to be resourceful in how they
initialise. They may use similar steps to find API addresses, memory etc.
Unpackers often decrypt part of their unpacking code, like exe-encryptors,
to make examination more difficult. They may also employ similar
anti-debugging measures. Packers/Encryptors/Virri quite often have to use
fully relocatable code, and as such, use many of the same tricks (like the
call [NextInstruction]/pop pair used to get the current eip).
Links to some virri descriptions are provided in Further
Reading. Search for some more to see the similarities. Take a look at
the cabanas one, hmmm ;) | ||||
| What is a PE Header?
[Jeff]
[Miz] The important thing to grasp early on is that an executable is not just
a binary image of your code/data and nothing else. There are many pieces
of additional system information attached so that the OS knows exactly
what resources your executable will need.
It may be easier to think of an executable as a directory. In that
directory are some files and some sub-folders containing more files. The
structure of the whole directory tree will change from executable to
executable. For example if your executable doesn't have any resources then
the .RSRC section will be missing. If it contains debug information then a
.DBG section may exist etc.
A simple scheme may be: If you're more familiar with older COM files you'll be amazed at the
amount of information that is stored. It does indeed make unpacking more
involved that before, however, it also makes certain areas (like API
monitoring etc) very easy. More on that later.
If this structure is pretty new to you then I recommend you get hold of
a decent PE viewer and get a feel for the structure of a PE file before
delving deeper. Alot of tools exist, but my own favorite is PEBrowse
Professional. There's also MattP's PEDUMP util, complete with source to
study, see Further
Reading below.
Look again at the above layout. [From MattP's Windows Secrets] | ||||
| Can u tell at a glance,
without tools, that the PE Header is screwy? [Jeff]
[Miz] Anyways, 'screwy' PE Headers are normally a result of incorrect
unpacking - something we will hopefully be avoiding ;) | ||||
| What do the numbers we
see in PE Header mean to us? (eg Entry point, Image Base, Virtual size, Virtual Offset, etc. etc.) [From MattP's Windows Secrets]
Image Base RVA (Relative Virtual Address) AKA Relative
Offset (virtual address 0x401464) - (base address 0x400000) = RVA 0x1464
Virtual Offset AKA Virtual Address VirtualSize Entry Point AKA AddressOfEntry [Miz] | ||||
| Is ProcDump the only way
to fix a PE Header?
[Miz] As mentioned earlier, the '.TEXT' section is the general name for the
section containing code. Early versions of packers and encryptors only
modified this section as it was by far the easiest to do, and 'protected'
the main code. More modern packers give the ability to pack/encrypt most
section types including imports and resources. Doing this stops people
using APISpy programs and Resource grabbers as well as complicating the
whole unpack operation. Sometimes packers pack all but the first Icon
group so that your program still retains its own icons when in explorer
etc.
Remember, by the time the first instruction of the executable is
executed most of the usual headers have been processed by the OS. So if
you choose to pack these as well (for example the import section) then the
system will have no knowledge of it and so the unpacker has to build it up
itself.
This is where most people get confused and the sub-topic of
import/reloc, etc, rebuilding, will be covered in detail in another
section. | ||||
| What happens in the
proggy when its packed? [Jeff]
[Miz] Say you just wanted to pack the '.TEXT' (usually code)
section. A *very* simplified flow could be: | ||||
| What happens when our
packed proggy is run?
[Miz] What is meant by 'In-situ'? | ||||
| So for a simple target,
as described above, how would I go about unpacking it?
[Miz] What you would like is an unpacked image, as close as possible to the
original, and one that works with the OS correctly.
Here's an example you may be more familiar with: Now imagine a '.TEXT' section is packed. You know that depacker
encounters it, unpacks it and continues. *In principle* it's simply a case
of letting the unpacker unpack it, then stopping the unpacker unpacking it
next time.
Why *in principle*?
Because there are many caveats, which we will see later, but it's the
concept that is important to grasp.
Think now about what wrapper-style packers (like Petite, Neolite,
Shrinker, ASPack etc.) can and can not do. 'If this is the case, then (again, in principle) surely 'dumping'
the executable's image to disk at this point would be all that was needed,
yes?'
Again, yes and no ;) 'Ah, that's just because the entry point is pointing to the wrong
place, soon fix that.....'.
No (patience grasshopper...) - remember, the entry point is just a
*tiny* bit of the information needed by the OS in order to
correctly process an exe. We really need to be sure we are fixing
*all* the relevant information required by the OS. This is the
(assumed) difficult part of unpacking. For us to be confident in our
ability to restore executables then we really need to be familiar with the
actions of unpackers, very familiar with the structure and [and
functionality of] the PE Header, as well as the actions of the Windows
loader. | ||||
| Ok, so what type of
'Sections' are there? What exactly are they, what do they mean, and where do they come from? [Miz] Sections themselves can be thought of as chunks of distinct code, data,
system resources, user resources, etc.
'But when I program I have no knowledge of these!'
It depends on what type of assembler/compiler/linker you use. The data
is orgainized in this way because it is an OS requirement, not a
programming one. It makes no real difference to most programs WHERE the
stuff is stored, but it very important that everything is in the right
place and correct for the OS.
'So if I wrote a simple program like a messagebox that says
'UmBongo!' then how would that look when compiled/linked into and
executable?'
Your code would be placed in it's own section. The string (data)
'UmBongo!' would be placed in its own section. You would have and import
section that described what relevant DLL's and API calls you used (User32
and MessageBox), any icons that were created would go in a resource
section, etc.
Here's some brief descriptions of commonly encountered section types:
Think of initialised data as things like strings (text), or a block
of data to decrypt; ie chunks of data with some predetermined value.
Think of uninitialised data as being variables (not predefined ones),
empty arrays, etc; ie blocks of data that will be filled by the
executable with some data at runtime, but at startup have no preset
values.
'Why are these treated separately?'
Initialised data obviously needs room in the exe to be stored,
whereas uninitialised data does not. The Windows loader will MAKE the
room for this data when a file is loaded, but it requires no storage
in the executable itself, other than the section description.
Hence the two types of data section.
The '.IDATA' section provides all the information the OS needs about
what DLL's and API calls were explicitly linked with the executable.
'explicitly linked?' 'So just by looking at this section, and without running the
program itself, I can figure out whether it uses winsock, mapi, even if
it uses a messagebox?!'
Yes. That's how programs like QuickView/DLLShow etc. work, and more
interestingly how API spy programs can monitor API calls from
executables. It is because of this that many packers choose to pack this
section. Obviously by packing it you are 'hiding' it from such programs
and (more importantly) us!
'So why is this 'painful' in terms of unpacking?'
Well, judging by the number of emails and posts, this is the thing
that confuses people the most. Understanding the concept of dumping,
etc, comes quickly, but many do not see the importance of correctly
fixing this section. It is *vital*, in order for an executable to
work correctly all the time, in all enviroments, that it has a correct
import section.
'So why can't I just dump it, just before control passes to the
original exe?'
Remember we said that for the original program to function correctly
then all it's code and data must be correct? Remember also that we said
that an executable's image contains much more information that just
that? Remember that we said the OS relies on this information to
correctly provide the resources for an executable? Also remember that by
the time control passes to the original exe then all of it's OS
initialisation has already taken place?
'Well, if a packer has packed the import table then how on earth
can the OS know what's going on?'
The answer is it doesn't need to, *IF* the unpacker has done
the work that the OS would normally do for itself.
'What does the OS do with this section then?'
Take a look at an ascii-dump of any '.IDATA' section. You'll see
strings; names of DLL's and API functions. These are no use to an
executable in that format. Somewhere, somehow, these must be processed
into a format more usable for the exe.
I'm going to divert for a bit, but you'll see why.......
Imagine your messagebox program again. When you do a call to
'MessageBox' the compiler/linker does something that may at first seem
very strange. If you look at it in Softice you will see something like
Call [USER32!MessageBox].
What is strange about that? Look more closely. This explanation from
MattP's book should clarify things. If not, keep re-reading. Its an
important thing to grasp.
[Extract From MattP's Windows Secrets] 'So for an unpacker to correctly initialise this table it must
process the names stored in the '.IDATA' section. How does it do
this?'
API functions exist for this, they were mentioned previously, and are
used by programs that load and handle DLL's themselves. The more
relevant calls are GetModuleHandle, LoadLibrary, and GetProcAddress.
Full docs on these functions are supplied in separate .txt files with
this package. Read them and see how they would be applied to create the
table.
'Alot of information to absorb, so where does that leave us in
terms of unpacking files and correctly restoring their PE Headers?'
As you can hopefully see by now, if we are intending to bypass (and
remove!) the unpacker completely then we really have to make sure that
we have valid information in the PE header so the OS can do all this for
us again.
'How do we tackle this?'
There are many ways, and the way you choose will depend on how the
unpacker handles these things. Here's a simple example for a simple
unpacker's scheme:
What we can do here is let it unpack the '.idata' section, but
*importantly* stop it from generating the IAT and wiping the contents
of the original data. If we can do this then we have a valid import
table again, ready for the OS to process. All we would need to do then
is to alter the relevant 'Directory' settings in the PE Header to
point to the correct offset and size of this new (original!) import
section. Remember, we are doing all this so, eventually, we no longer
have to rely on ANY of the unpacker's code, and so can even remove the
unpacker completely. This is a simple explanation of a simple scheme. It's the thing I
'glossed over' in the original post on Torn@do's ASPack packed crackme.
I'm sure you can appreciate why now ;) The version of ASPack used made
this easy to do, but not particularly easy to explain (I didn't want to
confuse people too much!). Other packers, and I'm sure future versions
of ASPack, will not make things this simple so make sure you understand
the information here - read as many other resources as you can (See Further
Reading). The concepts will always be similar, but expect things to
be get tricky soon ;)
Now, where were we.....oh yes, section descriptions....
Remember that 'ImageBase' was our PREFERRED loading address, but that
the OS could, in theory, put it anywhere it wants? (It is very rare for
the OS to do this, but it's the fact that it can that is important).
If a .RELOC section exists, then in unpackers, just like in the
windows loader, you will see the code checking the preferred and actual
imagebases. If they are the same then the info in the .RELOC section
will be ignored, otherwise it will need to make some 'fixups' to
anything that assumed it would be located at the preferred image base.
The areas to 'fix' are stored in this (.RELOC) section.
'How does this affect us?'
Well, if we have (in effect) dumped the code/data and fixed the
import issue, but NOT fixed the reloc section then we have code, etc,
that will ONLY work if it is located at the imagebase when the dump
occurred. Like the import section, we need to fix this so that we have a
valid one for the OS to process should it decide later that it wants the
imagebase to be something else. Here's something that explains a real
world example:
[Extract From MattP's Windows Secrets] It's important to note that the JMP and CALL instructions generated
by a compiler use offsets relative to the instructions, rather than
actual offsets in the 32-bit fiat segment. If the image needs to be
loaded somewhere other than the location the linker assumed was a base
address, these instructions don't need to change, since they use
relative addressing. As a result, there are not as many relocations as
you might think. Relocations are usually needed only for instructions
that use a 32-bit offset to some data.
For example, let's say you had the following global variable
declarations:
int i; If the linker assumed an image base of 0x10000, the address of the
variable 'i' will end up containing something like 0x12004. At the
memory used to hold the pointer ptr, the linker will have written out
0x12004, since that's the address of the variable 'i'. If the loader
(for whatever reason) decided to load the file at a base address of
0x70000, the address of 'i' would then be 0x72004. However, the
pre-initialized value of the ptr variable would then be incorrect
because i is now 0x60000 bytes higher in memory. This is where the
relocation information comes into play. The .reloc section is a list
of places in the image where the difference between the linker-assumed
load address and the actual load address needs to be taken into
account." Like the .IDATA section, the .EDATA contains a list of functions,
only this time they are the names of functions WITHIN the executable
that are EXPORTED to other modules. .EDATA sections are more frequent
with DLL's (because they obviously EXPORT the functions that the
executable IMPORTs), but they can also (rarely) occur within executables
themselves. You will probably have noticed the 'Import Functions /
Export Functions' controls within W32dasm. Load a few executables into
it and see. Then take alook at some dll's.....
If you want to read more on this (and things like export
forwarding ;)) then take alook at MattP's stuff. He goes into some
detail there. For our purposes (ie unpacking), it's just another section
we'll have to deal with.
Everything a resource grabber grabs, and now you know how ;) Again,
just like any other section.
Note: If a program chose to pack this section then things like
correct icons would not be displayed in Desktop, explorer etc. For that
reason many (if packing the .RSRC section) strip out the first icon
group and store it unpacked elsewhere. That way the executable 'looks'
correct when viewed by other programs via icons.
| ||||
| It's been mentioned here
that sections are unpacked over themselves. How can this work? Surely, if some sections are packed and others are not then some data would get overwritten because the unpacked data must be bigger! [Miz] Another slight digression now, but something you may wondered about
before......
Imagine you wanted to patch some code in an exectable and have found
the relevant location using SIce (or whatever). It may give an address
like 0x401234, and from what you have learnt from the PE header, you know
that the preferred ImageBase was (say) 0x400000. If the disk image and the
memory image were the same then the offset in the executables disk image
to patch would be 0x401234-0x400000 = 0x1234, right?
Wrong.
'So, what happens for these to be different?'
Well, the way the executables MEMORY image is layed out is specified by
some information within the PE Header. The OS uses this information to
organize and create the memory for the executable. It 'maps' the sections
based on information stored in the PE header.
Remember before, when we looked at the differences between initialised
and uninitialised data? Remember we said that for uninitialised data the
PE Header only specified how much 'room' it would need but required no
actual storage in the disk image? Lets look at how this was done. It's a
simple example that can be extended when looking at other section types.
Each section description within the section table has the following
fields:
[..Extract from WinNT.h, remember to look here for the structure
definitions....]
The answer lies in VirtualSize, VirtualAddress, SizeOfRawData and
PointerToRawData.
Now go and take a look at a few section descriptions using a PE viewer
before continuing to read the rest of this answer. Get a feel for the type
of numbers in them and how they differ. Look at some unpacked files'
sections as well as packed ones. You should immediately see some big
differences.
Done that? Good. Then the following will make more sense....
VirtualAddress vs PointerToRawData: So, can you see how we could use this information to calculate the file
offset for a patch address (given it's address in the memory image) using
the PE Header?
Heres an example: (0x1234 - 0x1000 = 0x234, so our data is 0x234 bytes 'into' this
particular section, and the DISK image of this section is 0x200, so our
the diskimage address would be 0x200+0x234 = 0x434!).
'This is all very interesting, but I don't see the relevance!'
It is very relevant if you are going to manually dump files and
sections as we will see later. But for now the important ones to
understand are the differences in the 'size' members - VirtualSize and
SizeOfRawData.
As you have some knowledge now about the differences between RVAs and
raw offsets, VirtualSize and SizeOfRawData should be quite simple.
VirtualSize is the size of the block of memory the OS will allocate for
this particular section and the SizeOfRawData is the size that section
takes up in the diskimage.
'Why would these be different?'
For a number of reasons. We already have seen one big one in how the OS
will map an uninitialised data section. In this case it will set
SizeOfRawData to zero, and set the VIRTUALSIZE to the size actually needed
by the program.
'But how does it apply to packing/unpacking?'
Well, imagine the following scenario: Read it again if the answer is not immediately obvious.
Got it? Cool. 'Is it really that simple?'
Well, these changes obviously have knockon effects for the other
sections. They can not be changed in isolation. If you change the sizes of
addresses for a section then all the others would need to have THEIR
addresses/offsets moved up or down to compensate. Remember that we said
some packers packed in-situ and others used blocks of memory
allocated by the unpacker? The reasons why they choose different methods
is normally determined by the compression algorithm used. But you should
now be able to see how an in-situ packer can work with out overwriting
other sections. | ||||
| How did u KNOW to change
a C00000040 to a E00000020 (<-in your initial post?) What are the significance of those numbers? [Miz] If you refer to the original post and the WinNT.h header the
significance should be clear. We are basically changing the section
characteristics back to what they really represent. For example, packers
may change the system charateristics to make sections writable (because
they write the unpacked code back to the section's memory, as allocated by
the OS) and to fool SIce and disassemblers, etc.
I would have thought this would screw up some OS, particularly NT with
it's stricter policy on memory management, but it apparently
works......most packers seem to have adopted it.
Anyway, by changing it back we may even be fixing bugs in the packers
;)
[From MattP's Windows Secrets] Some of the more important flags..... | ||||
| 'I have applied the the
things described here and in 'Further Reading' and have successfully
reversed a 'BrandX' packed file! I am indestructible! I am 'master
reverser', hear me roar!'
[Miz] Unpackers/encryptors etc have been adopted as 'protections' by many
developers too lazy to do their own. As such, when shown to have
weaknesses, they will improve. Remember, they are now commercial ventures,
and it makes no commercial sense for them to remain trivial to bypass.
Expect new versions, new ideas and techniques. More fun for us. At last,
something interesting we can use to pass those long lunchtimes and late
night sessions that isn't all over in 3 seconds ;) | ||||
| Your
Questions/Answers/Comments/Corrections/Contributions
| ||||
| Further Reading
If you only ever read one book then read this, (ok, and Brave New
World...). There's an online version...search...just until your
bookstore gets it back in stock of course....... (Note: obviously the online version is not there(!), but an updated
source disk is, with a new improved PEDUMP etc.)
As +Fravia points out, searching is the first tool. Seek and ye shall
find.
|