Learning from Stratfor: Extracting a Salt from an MD5 Hash

by Acrobatic

In December of 2011, members of activist group 'Anonymous' released a slew (over 860,000 records) of private data stolen from think-tank Strategic Forecasting Inc. (Stratfor).

While I don't condone the theft, I do:

1.)  Condone the attention it brings to a firm that prides itself on being both intelligent and secure as a means of showing the public that no data is entirely secure.
2.)  As a means of pointing out these insecurities in the hopes that it will make them more intelligent and more secure with our data.

I've seen the list, in an attempt to see if my own information was compromised.  It was not (at least here, but was recently in the Zappos breach), but I can't say the same for almost a million other people.  The list contains mostly inconsequential information - but it does have an encrypted password (along with the email address and username) for each person.  After a cursory run through of several thousand random encrypted passwords, I was not able to crack any using the method I published a few years back.

Salting

These passwords are at least salted (salting is the process of taking a password and adding extra characters to it to make it more difficult to crack.)  If your password was submarine using MD5 encryption (which is what the majority of websites use to encrypt stored data) it would be encrypted as: a9bdfa76aa6d76f7bde66e470cf98553

In an effort to make your data more secure, a programmer might salt your data with another word, like kangaroo, by adding it to your password before storing it.  So, instead of storing the MD5 hash of submarine, which might be easy for a hacker to guess if they accessed the user database, the password is stored as a hash of submarinekangaroo, which would be much harder for someone to guess.  A smarter salt would be something random, like tH7rWslwj6, so that brute-force attacks on passwords with a word-list for salts would be rendered mostly useless.

Try it yourself if you want.  If you're on a Mac, go into Terminal and type:

$ md5 -s 'whatever-you-want'

# Linux
$ echo -n submarine | md5sum
a9bdfa76aa6d76f7bde66e470cf98553  -

then hit Enter.  What you'll see is the hashed value of your string of text.  Now try to add some characters to it - your own salt - and see how the results change.  It's important to realize that there's no "unhash" method, per se.

There's no such thing as:

$ unmd5 'a9bdfa76aa6d76f7bde66e470cf98553'

and get submarine in response.

But - if you go to Yandex and search for a9bdfa76aa6d76f7bde66e470cf98553, you'll find plenty of posts telling you the answer is submarine.

Salt submarine with your own new word (md5 -s 'submarineastroturf'), then search for that.  Chances are, your search will come up empty.  That's the importance of a salt.

How Does My Website Know My Password Then?

In most cases, they don't.

They keep the hashed version of your password, but they have no way of knowing what it actually is in "plain-text."

To see if the password you enter when you login matches what they've stored in their database, they have to hash it, and compare it to what's on file.

So if your hashed password was stored as 8833f74b9da9cf81d33f6c6a79ac9985 and you entered telescope as your password, a program quickly converts your plain-text password to 8833f74b9da9cf81d33f6c6a79ac9985 and compares it to what's stored.

In this case, there's a match - and you're granted access to your account.  If they happened to salt your password before storing it by adding the word pineapple to the beginning, then your stored password would be: 0cf7664d30e8a72b6b423148578ddfba

(Again, you can confirm by typing: md5 -s 'pineappletelescope' in your terminal).

So, when you enter telescope into your website's login box, before it's hashed, the website will add pineapple to your password, then hash it to compare with what's stored in the database.  You can see not only the importance of salting, but also knowing exactly what the salt is.

Without it (without knowing pineapple, in this example), it would impossible to match the password you entered with what was stored.

Looking for Patterns

So, we can assume that Stratfor is at least smart enough to salt their passwords.  The question is, can we take 800,000+ hashed salted-passwords, and find any patterns or similarities in them?

From that, could we build a frequency of the most common hashed passwords, then assume that those passwords are the same - and try to derive an algorithm that produces a salt?  Can we get lucky and hope that Stratfor salted their passwords with either the username or email address of each user?  Or did they use the same salt for every user?  I would assume they wouldn't use an email address - especially since a user can change their email address - so we'll take that one out of the mix.  I will, however, try the username as a salt, as that is typically something a user isn't allowed to change.

The First Clue - No Duplicate Hashes

To begin, I sorted the 860,160 hashed-passwords alphabetically, and interestingly (at least in the few thousand I quickly scanned), there were no matches.

What does this mean?  It means that a different salt is being used for each person.

Why?  Because in a list of 860,160 passwords, the chances of none being the same are infinitesimally small.

Let's say two people used the phrase opensesame as their password.  The hash of this is: e6078b9b1aac915d11b9fd59791030bf

Let's now say that Stratfor salted all passwords when they stored them, and salted them with the phrase fishbowl123 by appending it to the end of a user's password.

So, opensesame becomes opensesamefishbowl123, which is hashed as: 8feb9db2775f81e3b152803bb9704fad

So, theoretically, if only two out of 860,160 people had the password of opensesame, we should see the hash 8feb9db2775f81e3b152803bb9704fad show up at least twice.

But there are no duplicates - and that indicates that the same salt isn't being used for each person.  This is too large a sample size to not have at least two people with the same password - any password.

Since we learned above that the salt must be known in order for a website to check your password, we'll assume that Stratfor made their salt based on something unique to the user.

The User Record

The user records for the Stratfor file include information like name, Stratfor ID, user ID, user email address, time zone, picture, signature, theme, last login date, account creation date, and a few trivial ones.

We know that the salt most likely comes from one of these fields of information, and we know the salt needs to be unique to each user, so we can start eliminating some of these.  The dates are interesting, but there is a good possibility that there are plenty of users with the same login date, or account creation date, even down to the hour or minute, so we can't assume that is unique.  We also know that there will be plenty of duplications of the time zone, so that one could be eliminated as well.  The theme (which I assume was some sort of color theme or account theme for each user) can also fall under the "duplicate" category, but it falls under another greater category, which is that of a field where the value could change.

For the salted password to work, the salt must always stay the same.  We can also consider user email address as something changeable, as well as the user's name, so we'll eliminate those from our list of possible salt options.

That leaves us with 2 good options:

Because we know that the salt is unique to a user, we have a good starting point for our attack, using the two options above as our primary salt tests.  We know that Stratfor isn't using a random string for a salt - something that they've locked away in some file - because even if they did, there's a great possibility we would have duplicate hashes - and we have none.

We have candidates for our salt, now what?

To do all the password crunching and text analysis, I'll be using my new friend, Ruby on Rails.  Rails makes it really easy to spin up a quick database and start throwing data in it and doing text manipulation.  The first step is to clean up the list and throw it into a database table.  I took the huge Stratfor file, removed the extraneous columns and imported the user records into a database.

Next, I created a model for attempts.  The attempts are based on the premise that at least one user out of the 860k will have one of the "10 most common passwords" (which, incidentally, were taken from the leak of 32 million passwords from RockYou.com's compromised systems.)

The ten passwords we'll start with are:

123456
12345
123456789
password
iloveyou
princess
1234567
12345678
abc123
monkey

What we'll do is take each of the ten passwords, and add the user ID to the beginning, test it, then add the user ID to the end, and test it.

For example, lets say the user's password hash is 3d50169ccfe06ecf1bdf4c63fb199bd9, their user ID is 20, and their Stratfor ID is 23087.

I'll take our first password, 123456, prepend 20 to it, to get 20123456, then get the hash (md5 -s '20123456'): 11720f3fa65c0fe57212ba6f12af1af1

No match.

So now I'll try 123456 and append 20 to it, to get 12345620, then get the hash (md5 -s '12345620'): 594111f029cbea462f70398257ac0e7f

No match.

Now I'll try it with their Stratfor ID.  No match?

Now I'll move to the next of our top ten passwords, 12345, and continue the test.  For each password in our list, we have to try four different combinations.  That's 40 combinations for our ten passwords, tried across 860,160 rows, which means over 36 million tries.

If none of these works, the odds of the salt being based off one of our test columns seems slim, at which point we might consider that the hash is built off of more than one column (for example, prepending the Stratfor ID to the password and appending the user ID to the end).  If that's the case, our number of brute-force attempts increases exponentially - and that's bad news for this exercise, but better news for those whose data is at risk.

The Results

Armed with my list of ten common passwords and the Stratfor hash, I put Ruby to the test.  Less than 20 minutes later (even running on an under-powered MacBook Air), the experiment was a success, and the results are stunning:

Of the 860,160 user accounts from the Stratfor file, 986 of the users had one of the ten common passwords.

The salt, as it turns out, is the Stratfor ID, prepended to a user's password.

So, if your password happened to be monkey, and your Stratfor ID was 187519, your password is based off the MD5 hash of 187519monkey.

(Incidentally, 14 people of 860,160 had the password monkey.  The most common, sadly, were 123456 (483 occurrences), and password (285 occurrences).

What Does This Mean?

It means someone nefarious, knowing the salt column, could take it and run each of the users' passwords against a brute-force dictionary - and there is no doubt that the 986 number would greatly increase, giving the hacker access to thousands of accounts.

It also means that it only takes two people to have a bad password to crack a salt.If no one in the 800k test had used one of those top ten passwords, there's a good chance I would've gone on to another method, having found no matches.

What does it mean to Stratfor, and companies like them?

You have to do a better job of protecting our data.

Salting is a good step towards protecting data, but if you don't use it right, it's only a minor stumbling block to someone with relatively little skill.  Perhaps salting with data from multiple columns, or column data in reverse (maybe the username backwards), or a column on each end of the password (maybe a username and the account-created date), like usernamemonkey01-25-2012 would be better.

The insecurity of our personal data is troublesome, and breaches happen almost every day.

I can only hope this will help those who keep our data become more responsible in their protection of it.

Return to $2600 Index