poking around with GDB

by _dose
arf arf arf

intro

In this document I'm building on the what I've discussed in the first document. That document, like this one, is aimed at people starting off with GCC generated ELF structures under linux.
I can't be bothered to write a good introduction for this. Nothing advanced, blablabla.. Tools used and knowledge required were all handled in the first document, so if you haven't read it - you're more on your own than before. Have fun.

'ere we go

Here's the program we'll work with.
/* first.c */
#include <stdio.h>

int function1() {
	
	printf("\tIn function 1.\n");
	sleep(2);
	printf("\tCalling function 2.\n");
	function2();
	printf("\tReturned from funtion 2.\n");
	sleep(2);
	printf("\tReturning from function 1.\n");
	return(0);
}

int function2() {

	printf("\t\tIn function 2.\n");
	sleep(2);
	printf("\t\tReturning from function 2.\n");
	sleep(2);
	return(0);
}

int main() {

	printf("In main()..\n");
	printf("Calling function 1.\n");
	function1();
	printf("Returned from function 1.\n");
	sleep(2);
	return(0);
}
Why this silly and pointless program? - Well, for illustration. Compile it the usual way and dasm it. We'll also start using gdb and usually you'll want to compile your code with '-g' to so the compiler adds debugging symbols to your object. However, we like to make life difficult for ourselves, so skip that.
Let's look at the first printf() function, here's the dump..
Possible reference to string:
"In main().."

080484d5 <main+0x9> push   $0x80485ff

Reference to function :
     printf@@GLIBC_2.0

080484da <main+0xe> call   08048340 <_init+0x84>
080484df <main+0x13> add    $0x10,%esp
Now obviously, the number being push'd at 0x9 does not contain the string "In main()..". It's the address where the string is stored. And the call at 0xe hooks to the printf function in libc. Look at the dynamic symbol table in the dasm file...
DYNAMIC SYMBOL TABLE:
08048300  w   DF *UND*  0000007d  GLIBC_2.0   __register_frame_info
08048310  w   DF *UND*  000000a9  GLIBC_2.0   __deregister_frame_info
08048320      DF *UND*  0000016e  GLIBC_2.0   sleep
08048330      DF *UND*  00000118  GLIBC_2.0   __libc_start_main
08048340      DF *UND*  0000002f  GLIBC_2.0   printf
0804856c g    DO .rodata        00000004  Base        _IO_stdin_used
00000000  w   D  *UND*  00000000              __gmon_start__
This shows us the 'off-shore' routines called from within our program. We could compile the program as static, then the code for the sleep() and printf() routines would be compiled into the program (and no longer appear in the dynamic symbol table..)
Now to we're going to fire up gdb. I alias gdb as 'gdb -silent' so I don't get the whole copyright message every time. We're going to look at the string whose address is being pushed onto the stack, so we take the address from dasm (see partial main() dump before) and break on it. You can, of course do all this in a graphical debugger (usually a frontend to gdb), but that wouldn't display as well in a pure text file..
$ gdb ./first
(no debugging symbols found)...
(gdb) br *0x080484d5
Breakpoint 1 at 0x80484d5
(gdb) run
Starting program: /home/dose/work/start/first
(no debugging symbols found)...
(gdb) disassemble 0x80484d5 0x80484df
Dump of assembler code from 0x80484d5 to 0x80484df:
0x80484d5 <main+9>:     pushl  $0x80485ff
0x80484da <main+14>:    call   0x8048340 <printf>
End of assembler dump.
(gdb) x/s 0x80485ff
0x80485ff <_IO_stdin_used+147>:  "In main()..\n"
Here we've set a breakpoint at address 0x80484d5. When run, the program breaks when the instruction flow reaches this address and we get a gdb prompt again. Disassembling the next two instructions shows us the same as in the dasm dump. The command x/s address literally means, 'examine the string at address'. We can use a repeat count to show other strings.
(gdb) x/5s 0x80485ff
0x80485ff <_IO_stdin_used+147>:  "In main()..\n"
0x804860c <_IO_stdin_used+160>:  "Calling function 1.\n"
0x8048621 <_IO_stdin_used+181>:  "Returned from function 1.\n"
0x804863c:       ""
0x804863d:       ""
(gdb)
The strings are all NULL terminated, which we can see like this..
(gdb) x/14xb 0x80485ff
0x80485ff <_IO_stdin_used+147>: 0x49    0x6e    0x20    0x6d    0x61
0x69   0x6e     0x28
0x8048607 <_IO_stdin_used+155>: 0x29    0x2e    0x2e    0x0a    0x00
0x43
(gdb)
x/14xb 0x80485ff means 'examine 14 bytes in hex from address 0x80485ff'. The string "In main()..\n" is 12 characters long ('\n' is one character), so at position 13 we expected a 0x00 (NULL) and there it is. At position 14 we see the first character of the second string, whose address is 0x804860c. (Yes, 0x43 is ASCII for 'C'). Try a 'help x' for more possible usages for it.
Next, we'll have a look at the stack by setting a few more breakpoints. We can also set breakpoints by simply using the function name instead of using the address. If the program was compiled with debugging symbols, you can also break on source line number. We're still in the same gdb session, by the way..
(gdb) br function1
Breakpoint 2 at 0x8048416
(gdb) br function2
Breakpoint 3 at 0x804848a
(gdb) stop
(gdb) info breakpoints
Num Type           Disp Enb Address    What
1   breakpoint     keep y   0x080484d5  <main+9>
breakpoint already hit 1 time
2   breakpoint     keep y   0x08048416  <function1+6>
3   breakpoint     keep y   0x0804848a  <function2+6>
(gdb) run
Starting program: /home/dose/work/start/first

Breakpoint 1, 0x80484d5 in main ()
(gdb) bt
#0  0x80484d5 in main ()
#1  0x40031a12 in   ()
'br' is short for 'breakpoint'. As you can see, the breakpoint on a function is set straight after the function prolog, which is 3 instructions or 6 bytes long. (See 'function+6' in the breakpoint list).
'bt' is short for 'backtrace'. It shows the current stack frame as '0' and previous stack frame as '1', etc. The function name, where known, is also shown and the address shown is the Instruction Pointer of that Stack Frame (i.e. the instruction right after the 'call' for most previous stack frames). In this case it's the address of the breakpoint we've set. We can verify this by x'ing the address displayed. Or x'ing $eip.
(gdb) x/2i 0x80484d5
0x80484d5 <main+9>:     pushl  $0x80485ff
0x80484da <main+14>:    call   0x8048340 <printf>
(gdb) c
Continuing.
In main()..
Calling function 1.

Breakpoint 2, 0x8048416 in function1 ()
(gdb) bt
#0  0x8048416 in function1 ()
#1  0x80484f7 in main ()
#2  0x40031a12 in   ()
(gdb) x/2i 0x80484f7
0x80484f7 <main+43>:    addl   $0xfffffff4,%esp
0x80484fa <main+46>:    pushl  $0x8048621
(gdb) x $eip
0x8048416 <function1+6>:        addl   $0xfffffff4,%esp
If you remember some of the stack building stuff from the previous document, you'll know that the %esp register holds the Stack Pointer, which points to the current top of the stack. The %ebp register holds the Base Pointer or Frame Pointer. It's also been called the Frame Base Pointer. I use all of these terms .. The frame pointer holds the address of the beginning of the current stack frame. It is primarily used to reference local variables (for which room is made on the stack first) and arguments to the function, (which are push'd onto the stack before the call). The compiler could of course reference each of these relative to the current stack pointer, but that would involve a lot of overhead as it keeps changing. Is this for real, you may ask. Well, see for yourself..
(gdb) bt
#0  0x8048416 in function1 ()
#1  0x80484f7 in main ()
#2  0x40031a12 in   ()
(gdb) x/a $ebp+4
0xbffffd00:     0x80484f7 <main+43>
As you can see, we're in function1() and the address 4 bytes below the address of our Frame Base Pointer (%ebp) contains the location of our return address, where execution will continue once we return from function1() back into main(). Of course, we're not that far yet. First function1() is going to call function2() which will break after the function prolog.
(gdb) c
Continuing.
        In function 1.
        Calling function 2.

Breakpoint 3, 0x804848a in function2 ()
(gdb) bt
#0  0x804848a in function2 ()
#1  0x8048448 in function1 ()
#2  0x80484f7 in main ()
#3  0x40031a12 in   ()
(gdb) x/a $ebp+4
0xbffffcf0:     0x8048448 <function1+56>
This shouldn't surprise you. The same result as at breakpoint 2, only now we're in yet another stack frame - the one belonging to function2(). If a function is recursive, a seperate stack is created for each instance of that function, by the way. After this, the program will exit without breaking any more. Even though we set 2 breakpoints on functions, these breakpoints are set by gdb on exact memory addresses (right after the prolog...) and these addresses won't be reached again.
(gdb) c
Continuing.
                In function 2.
                Returning from function 2.
        Returned from funtion 2.
        Returning from function 1.
Returned from function 1.

Program exited normally.
(gdb)

finale

Well, there you have it. You did need gdb for this one. Nothing very special was discussed - but I warned you for that in the first paragraph. Hope you enjoyed it and a big 'Hi there!' to the same people as in part I.

    _dose
    02/2000