Examining GCC generated ELF files

by _dose
ceci n'est pas une stack

intro

This document is meant as an introduction to messing around with executables (programs) on the linux platform. A basic knowledge of linux, C and IA32 assembler is assumed. If you're new to this field, I recommend that you read some of the following.
Most programs on linux are written in either C or C++, so I'll take C as our high level language. I'll present several simple C programs and examine the compiled results using several tools. Your system should already have the main ones on it. If this isn't the case, why are you reading this? The main ones are gcc (the Gnu C Compiler), objdump (an object file dumping utility) and gdb (the Gnu DeBugger). You'll also want a hex-editor (pick any one) and DASM, SiuL+Hacky's wonderful objdump output parser. The HCU Linux page has DASM, several tutorials and any links you may require. It's over here.
DASM is written in perl. Even if you're not familiar with perl, I recommend you read the source - just so you know what's going on. If you're new to linux/assembler/C, read closely and reread and try to form a mental picture of what's going on at a low level. If DASM doesn't suit you, try REAP (Reverse Engineers Assembly Producer) by the Grugq. It's an X variant of DASM written in perl/tk.
Also, I write assembler code in AT&T syntax as opposed to the Intel syntax that most people seem to prefer. In other words, I write code like this..
AT&T
opcode source, destination
instead of this..
Intel
opcode destination, source
Which is a pretty wierd way to write anyway. Plus, all the standard unix tools use it, so compliance is just a lot easier.

context

As you are no doubt aware, in C the basic control units of the program are functions. The main() function is the entry point and (usually) exit point of the program. From main() other functions are called (all functions can call other functions). When a function reaches its end, control is returned to the parent function. Functions are usually called with arguments that are operated on and when the function ends it returns a status to the parent function.
At the lowest level, the computer is still a linear device. The term 'multi-tasking' can be misleading because it implies that the computer is working on multiple tasks simultaneously. This is usually not the case. The operating system runs and freezes each program so that multiple programs can execute (usually not to completion) within a given time frame. Process scheduling falls outside the work we want to cover here, but I just mentioned it for perspective.
When a function is called, the caller function pushes the arguments onto the stack. The stack is an area of memory unique to each process and its sole function is to provide a context for each function to work with and temporary storage for variables manipulated inside the function. Stack theory follows later on. When a function ends, it puts its return value in the eax register where its parent function can access it. The return value in eax is just a meaningless string of bits (32, to be exact) and can be any number of things depending on the context. This is why C functions have a data type. A function declared as
	int multiply (int a, int b)
accepts two arguments (int a and int b) that are pushed onto the stack by the caller function and returns a value of type int in eax when finished.

theory

Before I get started I'd like to say that what we are looking at is how GCC generates binaries from C code. One of the most important structures is the stack, closely followed by C's idea of functions and Intel's idea of processor architecture. A big pitfall when working on the binaries we're looking at is that you may start to think to this is the way all computers work. This is not the case. To name one example, arguments could also be passed to functions via various registers. This is not a very efficient model, but one that may work in different circumstances.

The stack is a region of memory where variable arguments (data) are stored for use by routines that fall outside the direct control of a function (i.e. other functions). In our current model, the stack grows downwards. This means that the bottom of the stack starts at the high address in memory. As arguments are pushed onto it, the size decreases. One of the registers (esp) is the Stack Pointer. This register contains a pointer to the current memory address of the the stack. Another register contains the Frame Pointer (ebp - also known as the Base Pointer).
Each function has its own stack portion (or Frame). The Frame Pointer is used for efficiency purposes. When a new function is called, a new Frame is created on top of the stack and contains the Return Instruction Pointer (address of code to execute when finished) and the Stack Frame Pointer (which contains the Frame Pointer of the previous function). After this follow the local variables (which are declared inside the current function) and before the Return Instruction Pointer we have the arguments to the current function. They are before the current Frame Pointer because they are part of the stack frame of the caller function. Remember that the caller function pushes these arguments onto its Stack Frame.
So it looks something like this...


- [local variables] - [previous stack frame] - [return IP] - [arguments] -

                    ^                                      ^
                    |- current_stack_pointer               |- top_of_stack_frame

ASCII representations are not one of my stronger points, so just bear with me.
Of course, this doesn't all happen magically. When compiled, each function has some code stuck onto its beginning and end to make this work. These parts are known as the function prolog and function epilog. At machine code level they look like this...
Procedure prolog
	pushl  %ebp
	movl   %esp,%ebp
	subl   $0x4,%esp
The Return IP is pushed onto the stack by the 'call' instruction. The 'pushl %ebp' is the instruction that pushes the current Base Pointer onto the stack and after that 'movl %esp,%ebp' makes the current stack pointer the new Base Pointer. The instrunction 'subl $0x4,%esp' creates room on the stack for the local variables. The subtraction looks strange if you don't remember that the stack grows downwards. The $0x4 part is variable and depends on the amount and size of variables used inside this function.
Procedure epilog
	movl   %ebp,%esp
	popl   %ebp
	ret
Here we see the reverse of the prolog. The Base Pointer is moved into the Stack Pointer register and the current Base Pointer is pop'ed off of the stack after which a 'ret' is executed which returns to the Instruction Pointer where the parent function left off (the Return IP is stored on the stack as well..) So basically, the state is restored to the state before the call, except that the eax register now holds the return value.
These two parts are very important and you will run into them all the time, so make sure you understand what's happening. Another thing that I find very interesting about them is that you have no control over them when coding in C. This is all purely compiler generated code, the glue that makes the functions work.

practice

Now for some hands-on work. We'll start by looking at a simple C program.
/* no1.c */
int main() {
        return(0);
}
Run the following
~$ gcc -o no1 no1.c
~$ ./no1
~$ echo $?
0
~$
With the first command we compile the code, with the second we run it and with the third we echo the contents of the special variable $?, which contains the exit code of the last command. In this case, that's 0, which should make sense to you, as this program does nothing but exit with status 0 (return(0);). We could also use this command
~$ gcc -S -o no1.S no1.c
which tells GCC to produce assembler code in the file no1.S. By all means, try it and have a look at the file. Most of the time we won't be looking at our own code and will have to disassemble the program we want to examine. So that's our next step. One way is to use objdump directly (read the man page for more info on objdump), but I prefer using DASM. So,
~$ dasm ./no1 ./no1.dasm
This creates the file no1.dasm. I use less or vim for browsing it, but any viewer/editor will do. The built-in one of Midnight Commander is also a good choice if you've worked with Norton Commander under DOS.
What we're looking for is the main() function. So, a search for main gives us
08048460 <main> pushl  %ebp
08048461 <main+0x1> movl   %esp,%ebp
08048463 <main+0x3> xorl   %eax,%eax
08048465 <main+0x5> jmp    08048470 <main+0x10>
08048467 <main+0x7> movl   %esi,%esi
08048469 <main+0x9> leal   0x0(%edi,1),%edi
08048470 <main+0x10> movl   %ebp,%esp
08048472 <main+0x12> popl   %ebp
08048473 <main+0x13> ret
Here we see the function prolog, followed by eax being set to 0. xorl %eax, %eax does a logical XOR on eax with itself. If you look at your logic tables you'll see that XORing a value with itself sets all bits to zero. The compiler could have written 'movl $0, %eax', instead of 'xorl %eax, %eax', but this requires more CPU cycles. Using xor to set a location to zero is used very often and is a good point to remember.
After that the flow jmp's to main+0x10 which is the function epilog. The instructions at 0x7 and 0x9 are never reached and we'll ignore them as compiler optimization.

In our next example, we'll complicate matters slightly and create a function that returns double the value that it is called with and return that as our return value from main().

/* no2.c */

int double_ret(int value) {
        return(2*value);
}

int main() {
        int tmp;
        tmp = double_ret(5);
        return(tmp);
}
Yes. I know there are much tighter ways of writing this. Now compile and run it and look at the exit code. That it's 10 should be obvious. Now disassemble it and look at the main() function.
08048480 <main> pushl  %ebp
08048481 <main+0x1> movl   %esp,%ebp
08048483 <main+0x3> subl   $0x4,%esp
08048486 <main+0x6> pushl  $0x5
08048488 <main+0x8> call   08048460 <double_ret>
0804848d <main+0xd> addl   $0x4,%esp
08048490 <main+0x10> movl   %eax,%eax
08048492 <main+0x12> movl   %eax,0xfffffffc(%ebp)
08048495 <main+0x15> movl   0xfffffffc(%ebp),%eax
08048498 <main+0x18> jmp    080484a0 <main+0x20>
0804849a <main+0x1a> leal   0x0(%esi),%esi
080484a0 <main+0x20> movl   %ebp,%esp
080484a2 <main+0x22> popl   %ebp
080484a3 <main+0x23> ret
As you can see, this is a bit less straightforward. At 0x3 we see a subtraction from %esp. %esp contains the Stack Pointer and the stack grows downward. Here 4 bytes are added to the stack (or subtracted, depending on your viewpoint). This is our 'int tmp;'. The int is 32 bits and 4 bytes equals 32 bits. Next, a value of 5 is pushed onto the stack and then the function double_ret() is called. We'll look at double_ret() later. But we know that when it returns, the return value is held in eax. The first instruction after the call is an addition. This is an adjustment to the stack (its size is decreased) to compensate for the increase in stack size when we pushed 0x5 onto the stack as the parameter to double_ret(). Notice the similar size of int tmp; and the int argument passed to double_ret().
Next we run into more senseless compiler optimization at 0x10, 0x12 and 0x15. What is interesting is the construction of 0xfffffffc(%ebp). This is the address of our first local variable. The stack (still) grows downwards, so we find our first entries at the top. In this case the top is 0xffffffff and the first entry begins at 0xfffffffc and is four bytes long. (0xfffffffc to 0xffffffff).
After this a jmp past code that's never executed and we're at the function epilog again.

Now for a look at the double_ret() function.

08048460 <double_ret> pushl  %ebp
08048461 <double_ret+0x1> movl   %esp,%ebp
08048463 <double_ret+0x3> movl   0x8(%ebp),%edx
08048466 <double_ret+0x6> movl   %edx,%eax
08048468 <double_ret+0x8> addl   %eax,%edx
0804846a <double_ret+0xa> movl   %edx,%eax
0804846c <double_ret+0xc> jmp    08048470 <double_ret+0x10>
0804846e <double_ret+0xe> movl   %esi,%esi
08048470 <double_ret+0x10> movl   %ebp,%esp
08048472 <double_ret+0x12> popl   %ebp
08048473 <double_ret+0x13> ret
After the function prolog we see 'movl 0x8(%ebp),%edx'. The 32 bits at address 0x8(%ebp) are moved into %edx. At this address we find a value of 0x5. The value we pushed onto the stack in main() right before calling double_ret(). Then %edx is mov'd to %eax and then the value of %eax is added to %edx. This doubles the value in %edx, which now contains 0x10 and then %edx is mov'd to %eax, which is where we want it when we return from this function. If you were doing this by hand, you'd probably use less lines of code, but wether or not that is efficient is another matter. Another senseless jmp later we're at the function epilog and %eax contains the value of 0x10.
Now for our third example. We'll write a program that returns 1 or 0 based on a test function that tests a hard-coded variable.
/* no3.c */

int testint(int value) {
        if (value != 1)
                return(1);
        else
                return(0);
}

int main() {
        int tmp, value;
        value = 0;
        tmp = testint(value);
        return(tmp);
}
Compile and run it and you'll see that it returns 1. Let's say we want to make it return 0 instead, but don't have the source handy. (I can't think of any situation that someone would want to do this, but there are some wierd people out there). What now? Well, we disassemble the program. Time for another dump. The main() function should be self-explanatory, so we'll look at the testint() function.
08048460 <testint> pushl  %ebp
08048461 <testint+0x1> movl   %esp,%ebp
08048463 <testint+0x3> cmpl   $0x1,0x8(%ebp)
08048467 <testint+0x7> je     08048480 <testint+0x20>
08048469 <testint+0x9> movl   $0x1,%eax
0804846e <testint+0xe> jmp    08048490 <testint+0x30>
08048470 <testint+0x10> jmp    08048490 <testint+0x30>
08048472 <testint+0x12> leal   0x0(%esi,1),%esi
08048479 <testint+0x19> leal   0x0(%edi,1),%edi
08048480 <testint+0x20> xorl   %eax,%eax
08048482 <testint+0x22> jmp    08048490 <testint+0x30>
08048484 <testint+0x24> leal   0x0(%esi),%esi
0804848a <testint+0x2a> leal   0x0(%edi),%edi
08048490 <testint+0x30> movl   %ebp,%esp
08048492 <testint+0x32> popl   %ebp
08048493 <testint+0x33> ret
At 0x3 we see the test. $0x1 is compared to 0x8(%ebp), which contains 0x0. The test fails and 0x1 is mov'd into eax after which we jmp to the function epilog. We can see at 0x7 that if the test were to succeed that we'd jmp to 0x20, where we see %eax being xor'd with itself - setting it to 0x0 after which we jmp to 0x30, the function epilog. This looks like a desirable situation, so we'll have to work on that. One way to do this would be to invert the je at 0x7 by turning it into a jne. Another way would be to change 0x9 - 'movl $0x1, %eax' into 'movl $0x0, %eax'. How do we do that? First we have to find the hexcodes that correspond to this. I find that the easiest way to do this is simply to use inline asm in C. Like this.
/* asm1.c */
int main() {
        __asm__("movl $0x0, %eax");
        return(0);
}
Compile and run it if you feel the need. Like all our programs here, it doesn't seem to do much. Now you may be thinking 'why use mov after saying that xor works better when setting registers to 0x0?'. Well, the hexcode we're looking for has to fit in 5 bytes. The xor instruction uses 2. We could just nop the other three, but I'm writing this and I'm using mov.
Next we disassemble it. Only now we use objdump directly, like this.
~$ objdump -d --show-raw-insn ./asm1 > asm1.dump
Search through asm1.dump for the 'movl $0x0, %eax' part. It isn't that hard as it's in main().
08048460 <main>:
8048460:       55              pushl  %ebp
8048461:       89 e5           movl   %esp,%ebp
8048463:       b8 00 00 00 00  movl   $0x0,%eax
8048468:       31 c0           xorl   %eax,%eax
Next we want the corresponding part in no3. Dump no3 with objdump just like you just did to asm1 and look in the no3.dump file. In our case this is address 0x08048469.
08048460 <testint>
8048460:       55              pushl  %ebp
8048461:       89 e5           movl   %esp,%ebp
8048463:       83 7d 08 01     cmpl   $0x1,0x8(%ebp)
8048467:       74 17           je     8048480 <testint+0x20>
8048469:       b8 01 00 00 00  movl   $0x1,%eax
804846e:       eb 20           jmp    8048490 <testint+0x30>
So all we have to change is the value 0x01 at 0x0804846a in no3 to 0x00. Grab your favorite hex editor and open no3. Search for the hex pattern surrounding and including the part we want to change (there might be more movl $0x1,%eax instructions in there..) Once found, change it and run it.
~$ ./no3;echo $?
0
~$
Success.
Enough for now, I'm going to sleep.

finale

Ok, so I lied about needing GDB for any of this. I wrote this in one go and so it's probably full of mistakes, inaccuracies, omissions and bad grammar. I'll go over it at a later date.
Feedback appreciated, use the forum linked from the HCU linux page for this. A big 'Hi there!' to the following people for work they've done and documents they've written, in no particular order.
SiuL+Hacky, Mammon_, the Grugq, +ReZiDeNt, bastardAI and aleph1.

    _dose
    02/2000