Deobfuscation

by Kousu  (kousue@gmail.com)

Boilerplate:  I don't officially condone any of these activities, of course.  Use your own judgment.

Introduction

Compiled languages let you distribute binaries which, although all the machine code is there, are generally extremely time-consuming to disassemble.

Scripting languages do not have such a luxury.  They deal at a high-level, and running code on their level requires using high-level constructs (unlike with compiled languages, where the output is very low-level and the security is that:

  1. Information - names, indentation, etc. - is lost in the compilation.
  2. Not many people have the skills to do the reverse operation.

In the scripting language world, there are a great deal of idiots and/or liars who scam even bigger idiots by promising that no one will be able to "steal" their source code.

It should send up a warning flag if you ever consider using obfuscated code, especially if it's obfuscated.

In principle, this is as bad as binary blobs, which have led to, for example, rootkitability of every system using Wi-Fi.  In the great tradition of paranoia of this great zine, consider that no one knows what the script is up to.  Is it full of bugs?  Is it phoning home and giving confidential information like credit card numbers to the original author?

Well, luckily, with scripting languages, obfuscation is difficult to actually secure.

There's no way to run a generic program on such code and result in a completely irreversible encryption for the same reason DRM is fundamentally flawed: you have to decrypt it somewhere in order for it to run.

You'd need some sort of self-generating code to do it, but even then the very thing which makes interpreted languages so flexible (the eval() function/statement) that would have to be used to implement this can, with some effort, be intercepted so that eventually you find the original code.

Other tricks involving the use of external libraries are unlikely because of the complexity to the user (the one who wants to obfuscate their code) and security reasons, especially in web development.

SourceCop

We're going to use as our case study SourceCop, available from www.sourcecop.com for only $30 (regular price $45!) with the nice guarantee that SourceCop'd code runs on all of UNIX/Linux/BSD/Mac/Windows (which is nothing more than the list of platforms for PHP...).

So, first of all we install PHP (from php.net or your local package mirror if on a *NIX), if not already installed, and then we get to work.

Looking at a SourceCop'd script we see:

dhcart.php          # actual obfuscated script
scopbin/911006.php  # support code

From our knowledge of CGI scripts (of which PHP scripts are a subset) in general, we know that the website http://example.org/path/to/script/dhcart.php will cause PHP to load and run dhcart.php.

PHP, being a scripting language, just runs from the top, so we can start tracing the code immediately and looking for ways to get at the actual code:

The contents of dhcart.php:

<?php
if (!function_exists("findsysfolder")) {
    function findsysfolder($fld)
    {
        $fld1 = dirname($fld);
        $fld = $fld1 . "/scopbin";
        clearstatcache();
        if (!is_dir($fld)) {
            return findsysfolder($fld1);
        } else {
            return $fld;
        }
    }
}
require_once findsysfolder(__FILE__) . "/911006.php";

$REXISTHECAT4FBI = "FE50E574D754E76AC679F242F450F768FB5DCB77F34DE341";
/*[...-snip a lot of Hex...]*/
$REXISTHECAT4FBI = "94CD76CD371C5A7BC70C186E779C293B9B49BACA5A781A6";
eval(y0666f0acdeed38d4cd9084ade1739498("311B3C4449F31071C0", $REXISTHEDOG4FBI));
?>

So we see that it defines a function findsysfolder() if it doesn't exist.

At the end, it calls a function that itself has an obfuscated name ("y0666f0acdeed38d4cd9084ade1739498") with two arguments: a string of hex (probably more obfuscation?) and a variable $REXISTHEDOG4FBI which is defined as a big block of hex which is certainly the obfuscated code and then passes this straight into eval().  Incidentally, this program always uses the same stupid variable name.

This last point is our attack vector, the weakness I spoke of.

In fact, SourceCop appears to be overly simplistic (and it probably is).  It only has one eval() call in the entire block, so whatever this eval() does is the entirety of the function of this script and what is passed into it, by definition of eval(), must be the plaintext code.

So simply replacing eval() with a print() will give us the code!

Sure, it's possible the code could be multiple-obfuscated and that this would just give us another obfuscated block of source code, but then you just repeat this process until you get to the final plaintext.

And that is why obfuscation is useless and why anyone who has the gall to sell a shitty "product" that does it deserves to lose his balls.

Back to the code.

So we replace this eval() with print() and then hop to the command line:

$ cd ~/dhcart/
$ php dhcart.php
$

What?  Very strangely we got no output!

Perhaps it's time to check out what's in that mysterious scopbin/911006.php file (incidentally this same file is used for every SourceCopping):

<?php
 
ini_set("include_path", dirname(__FILE__));

[ ... ]

{
    return strstr($s, "echo") == false
        ? (strstr($s, "print") == false
            ? (strstr($s, "sprint") == false
                ? (strstr($s, "sprintf") == false
                    ? false
                    : exit())
                : exit())
            : exit())
        : exit();
}

[ ... ] 

function yrdhhdacdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
ini_set("include_path", "."); 

?>

It seems to be more of the same, except helpfully PHP requires naming variables with $ signs so we can spot that these are mostly not obfuscated code but rather awkwardly named variables.

So this here is a program.

Also, PHP requires the use of { ... } so we can figure out what the indentation should look like.

Initially when I did this, I put new lines in all the right places and using the magic of find-and-replace I shortened all the names and traced through it trying to understand.  But the quick fix here is simpler than that and I will cut to the chase.

Near the middle we see the use of strstr($s, 'print') among others in a ternary hook chain, where all the final else clauses are exit().

It's a good bet that this file is looking inside our source file for any uses of echo(), print(), sprint(), sprintf() (i.e., any attempts to do exactly what we're doing) and if so just killing the program.

Simply removing this check should make it work, so long as there are no other blocks.  There are multiple ways of removing it: the quick-and-dirtiest by far is to just rename what it's searching for.

Most reliably, replace all the exit() calls with some benign return value, like a false, as shown.  Or even better, blank the function body, remove everything, and just put a: return false;

$ cd ~/dhcart/
$ php dhcart.php
<?php

include "phpmailer/class.phpmailer.php";
include "whois_servers.php";
include "language.php";
if (!empty(SHTTP_GET_VARS)) while(list(Sname, $value) = each($HTTP_GET_VARS)) $$name = $value;
if (!isset($HTTP_SESSION_VARS['numberofitems']))
        $HTTP_SESSION_VARS['numberofitems']=0;
if (!isset($HTTP_SESSION_VARS['numberremoved']))
        $HTTP_SESSION_VARS['numberremoved']=0;

$numdomreg=count($register);
^C
$

Hooray, we see that it works and stop it before it's finished.  Now to save the results to a file:

$ php dhcart.php > dhcart.decrypted.php

Discussion

SourceCop is a particularly weak obfuscation.

All it does is use a cypher function to hide the code and then make it difficult for a human to follow the decryption code by using long meaningless variable names.  But the basic technique is the same for any of these systems.

These systems are just downright stupid.  Friends Don't Let Friends Use Obfuscators.

The method presented here - letting unknown code run on your system - is potentially dangerous.

It's not implausible that an obfuscator could try to detect if it's being run wrongly somehow and cause damage of unknown magnitude.  Sure, if that booby trap was ever set off incorrectly it could be very bad for the obfuscator's business, but with the level of shortsightedness blatantly displayed here it's a perfect possibility.

It would be wise to set up a jail system to test these things out on.  If running a *NIX you can make a chroot jail to do this.

Another method is to trace the code manually, try to figure out what it's up to, and then write a program implementing the decryption scheme.

Let's see that now.  But first, a preface.

In digging through SourceCop, I feel like vomiting.  It's disgusting, disgusting code and just wasting CPU cycles letting it run is nauseating.

Reverse Engineering

But anyway, here is the scopbin/911006.php file indented properly:

<?php

ini_set("include_path", dirname(__FILE__));
function A4540acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function b5434f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function c43dsd0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function Xdsf0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function y0666f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    $x0b43c25ccf2340e23492d4d3141479dc = "";
    $x71510c08e23d2083eda280afa650b045 = 0;
    $x16754c94f2e48aae0d6f34280507be58 = strlen(
        $x897356954c2cd3d41b221e3f24f99bba
    );
    $x7a86c157ee9713c34fbd7a1ee40f0c5a = hexdec(
        "&H" . substr($x276e79316561733d64abdf00f8e8ae48, 0, 2)
    );
    for (
        $x1b90e1035d4d268e0d8b1377f3dc85a2 = 2;
        $x1b90e1035d4d268e0d8b1377f3dc85a2 <
        strlen($x276e79316561733d64abdf00f8e8ae48);
        $x1b90e1035d4d268e0d8b1377f3dc85a2 += 2
    ) {
        $xe594cc261a3b25a9c99ec79da9c91ba5 = hexdec(
            trim(
                substr(
                    $x276e79316561733d64abdf00f8e8ae48,
                    $x1b90e1035d4d268e0d8b1377f3dc85a2,
                    2
                )
            )
        );
        $x71510c08e23d2083eda280afa650b045 =
            $x71510c08e23d2083eda280afa650b045 <
            $x16754c94f2e48aae0d6f34280507be58
                ? $x71510c08e23d2083eda280afa650b045 + 1
                : 1;
        $xab6389e47b1edcf1a5267d9cfb513ce5 =
            $xe594cc261a3b25a9c99ec79da9c91ba5 ^
            ord(
                substr(
                    $x897356954c2cd3d41b221e3f24f99bba,
                    $x71510c08e23d2083eda280afa650b045 - 1,
                    1
                )
            );
        if (
            $xab6389e47b1edcf1a5267d9cfb513ce5 <=
            $x7a86c157ee9713c34fbd7a1ee40f0c5a
        ) {
            $xab6389e47b1edcf1a5267d9cfb513ce5 =
                255 +
                $xab6389e47b1edcf1a5267d9cfb513ce5 -
                $x7a86c157ee9713c34fbd7a1ee40f0c5a;
        } else {
            $xab6389e47b1edcf1a5267d9cfb513ce5 =
                $xab6389e47b1edcf1a5267d9cfb513ce5 -
                $x7a86c157ee9713c34fbd7a1ee40f0c5a;
        }
        $x0b43c25ccf2340e23492d4d3141479dc =
            $x0b43c25ccf2340e23492d4d3141479dc .
            chr($xab6389e47b1edcf1a5267d9cfb513ce5);
        $x7a86c157ee9713c34fbd7a1ee40f0c5a = $xe594cc261a3b25a9c99ec79da9c91ba5;
    }
    return $x0b43c25ccf2340e23492d4d3141479dc;
}
function f5434f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function j43dsd0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function hdsf0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function tr5434f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function f0666f0acdeed38d4cd9084ade1739498($x)
{
    return implode("", file($x));
}
function g0666f0acdeed38d4cd9084ade1739498($s)
{
    return strstr($s, "echo") == false
        ? (strstr($s, "print") == false
            ? (strstr($s, "sprint") == false
                ? (strstr($s, "sprintf") == false
                    ? false
                    : exit())
                : exit())
            : exit())
        : exit();
}
function hyr3dsd0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function uygf0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function drfg34f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function jhkgvdsd0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
function yrdhhdacdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    if (file_exists($x456e79316561733d64abdf00f8e8ae48)) {
        unlink($x456e79316561733d64abdf00f8e8ae48);
    }
    return $Xew6e79316561733d64abdf00f8e8ae48;
}
ini_set("include_path", ".");

?>

First, you can see a lot of isomorphic functions which are probably there to throw us off - a stupid way to try it since it's so easy to remove.  This makes us suspicious.

Let's check dhcart.php for function calls (roughly approximated by searching for occurrences of ( ).

It turns out that only three non built-in functions are actually called:

f0666f0acdeed38d4cd9084ade1739498()
g0666f0acdeed38d4cd9084ade1739498()
y0666f0acdeed38d4cd9084ade1739498()

The first is a simple wrapper, the second is the one that dies if it decides we're being naughty (oh la la...), the third is the one with the loop and "255+" (suggestive of some encryption scheme).

Thus the only active code in 911006.php that we know of are these two functions, and tracing them will reveal any other active functions, and recursively doing this will tell us which code is live and which we can dump.

f0666f0acdeed38d4cd9084ade1739498() and g0666f0acdeed38d4cd9084ade1739498() call nothing but built-in functions, so we ignore them.

y0666f0acdeed38d4cd9084ade1739498() is more complex, so with the aid of searching for "(" , we discover... that it calls nothing but built-ins.

So surprise sur-f*cking-prise, the entire rest of the code is claptrap.  To /dev/null you go!

Now to make the names more readable.

The functions and their arguments can be renamed (but then re-aliased if you wish so that the obfuscated code will still run) according to what they seem to be doing.

To rename, we use the wondrous find-and-replace feature that your text editor should have.

Here is the code.  In the interest of leaving some small amount of mystery for you to puzzle over, I'm not going to explain it:

SourceCop-deobfuscation-code.php:

<?php
ini_set("include_path", dirname(__FILE__));
function decrypt($key, $cyphertext)
{
    $s = "";
    $i = 0;
    $keylen = strlen($key);
    $char = hexdec("&H" . substr($cyphertext, 0, 2));
    for ($j = 2; $j < strlen($cyphertext); $j += 2) {
        $cypherbyte = hexdec(trim(substr($cyphertext, $j, 2)));
        $i = $i < $keylen ? $i + 1 : 1;
        $plainbyte = $cypherbyte ^ ord(substr($key, $i - 1, 1));
        if ($plainbyte <= $char) {
            $plainbyte = 255 + $plainbyte - $char;
        } else {
            $plainbyte = $plainbyte - $char;
        }
        $s = $s . chr($plainbyte);
        $char = $cypherbyte;
    }
    return $s;
}
function y0666f0acdeed38d4cd9084ade1739498(
    $x897356954c2cd3d41b221e3f24f99bba,
    $x276e79316561733d64abdf00f8e8ae48
) {
    return decrypt(
        $x897356954c2cd3d41b221e3f24f99bba,
        $x276e79316561733d64abdf00f8e8ae48
    );
}
function loadFile($x)
{
    return implode("", file($x));
}
function f0666f0acdeed38d4cd9084ade1739498($x)
{
    return loadFile($x);
}
function checkFile($s)
{
    return strstr($s, "echo") == false
        ? (strstr($s, "print") == false
            ? (strstr($s, "sprint") == false
                ? (strstr($s, "sprintf") == false
                    ? false
                    : exit())
                : exit())
            : exit())
        : exit();
}
function g0666f0acdeed38d4cd9084ade1739498($s)
{
    return checkFile($s);
}
ini_set("include_path", ".");
?>

Conclusion

Obfuscation is inefficient.  Obfuscation is underhanded.

Obfuscation is written by people who assume others are really stupid and intend to exploit that.  It is as close to evil as ASCII can get.

I wrote this guide both to raise consciousness of this particular idiocy in the world today, and to guide newbies along the path to hackerdom.

I hope you found it enlightening.  Now excuse me while I flick this switch.

Code: SourceCop-deobfuscation-code.php

Return to $2600 Index