Converting the Voter Database and Facebook into a Google for Criminals

by Anthony Russell

Disclaimer:  I'm in no way advocating criminal use of United States voter databases and/or of Facebook.  If you use this research in a criminal manner, I'll do whatever I can to support law enforcement and help bring you to justice.  Don't be a dick; you've been warned.  Also, I've redacted some of the secret sauce that makes this work.  Sorry skiddies.

Summary

I was able to create a proof-of-concept application that scrubs a recreation of the Ohio voter database, which includes first name, last name, date of birth, and home address - and link each entry confidently to its real owner's Facebook page.

By doing this, I have created a method by which you can use the Ohio voter database to seed you with name, address, and DOB - and Facebook to hydrate that data with personal information.

There's a lot of danger in being able to link these two items in this fashion.  If put together correctly, it's essentially a Google for criminals.  Enter the target filters and get a list back of who they are and exactly where they live.

My application was able to positively link a voter record to a Facebook account approximately 45 percent of the time.  Extrapolate that out over the 6.5 million records in my database and you get 2.86 million Facebook records.

How I Found This

I was attempting to discover how Internet databases were getting my home address and personal information.  Most of them have opt-out policies, so for every one I opted out of, I had to figure out where it was seeded so I could opt out of that as well.  Eventually I hit a wall.  It was clear that the last companies I found were getting seeded from public data and then scrubbing the web in an attempt to link your data for sale.  Like any good hacker, I said, if they can do it, so can I.  <insert evil smile>

Getting Your Seed Data

To start, I had to see what public data was available.  In short, there's a ton.

No wonder we get marketed to non-stop by mail.  The government takes our personal information and puts it on the web for free.  Write a couple of scripts and you can tap it anytime.

Unfortunately I don't have lawyers that can litigate on my behalf if some state doesn't like me scripting their records search site, so I opted to find a downloadable database instead.  Then the great state of Ohio dropped a giant golden egg in my lap.

Two CSV files that have 6.5 million unique voter records in them.  No hacking to be done here.  Just a publicly available download that contains about 57 percent of Ohio residents.

It can be found here: www6.sos.state.oh.us/ords/f?p=111:1:0::NO:RP:P1_TYPE:STATE

Just download the files and upload it into your favorite database.  Because of the size, I chose to put it on Azure for my application.

Getting Information Out of Facebook

I'm going to do a little hand waving here because I don't want people using this in a malicious manner.

If you wanted to recreate it, you could do it with this article and some work on your own end, but you're not getting a complete answer here.

I essentially used two Facebook queries over and over.  For a simple example, let's say I wanted to find people on my street.  I would query the voter database something like this:

select LAST_NAME, FIRST_NAME, DATE_OF_BIRTH, RESIDENTIAL_ADDRESS1, RESIDENTIAL_CITY from dbo.ohio where RESIDENTIAL_CITY = 'myCity' AND RESIDENTIAL ADDRESS1 LIKE '%myStreet%'

With these results, I can now start searching for potential Facebook candidates.  To get my list of possible profiles I would run this query:

https://www.facebook.com/search/poeople/?q=FIRST+LAST+STATE

Once this comes back, I cache the source and run a regex on it to abstract the user profile IDs.  In order to get the profile IDs out you can use this regex:

(?<=_gll\'94><div><a href=\").*?(?=\" data-testid=\"serp_result)

Now that you have your list of potential profiles, you can start scrubbing them to find the one you want.  Before we can scrub them though, we need to pull key data off of each profile.  To do this, I used a series of regexes.

Get name from profile page:
(?<=fb-timeline-cover-name\">).*?(?=</span)

Get profile photo from profile page:
(?<=href=\")https://www.facebook.com/photo.php?.*?(?=\')

Get intro from profile page:

(?<=data-profile-intro-car).*?(?=</div>)

Get details from intro block:

(?<=href=\")(.+?)(?=\")

Get links from details:

(?<=href=\").*?(?=<?>)

Get text from details:

(?<=data-hovercard-prefer-more-content-show=\").*?(?=</a>)

If implemented correctly, the above regexes will give you a plethora of information on each individual that you can then use to start generating confidence scores for each profile.

Generating the Confidence Scores

This, surprisingly, is the tough part.  There's a bunch of gotchas in this part.  I used three main things for my confidence scores: does the first name exist, does the last name exist, does the city exist, and does the state exist.

Simple enough, but even this can be a problem.  People change names.  People list the state 50 times on their profile and the city once.  It's very variable.  I did, however, come up with a combination of scores that I think provides very accurate scores.

Scoring the Name:

Scoring the Text:

With the above scoring, I am able to produce an output similar to this: DANIEL <redacted>:

Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence: 1.02
Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence: 0.26
Username: https://www.facebook.com/jacob.<redacted>?ref=br_rs
Confidence: 0.26
Username: https://www.facebook.com/diesel.<redacted>?ref=br_rs
Confidence: 0.43
Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence:  0.41
Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence: 0.27
Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence: 0.41
Username: https://www.facebook.com/daniel.<redacted>?ref=br_rs
Confidence: 0.26
Username: https://www.facebook.com/Dan.<redacted>?ref=br_rs
Confidence: 0.27

As you can see, there's one profile that clearly stands out.  Sure enough, if you click into this profile, it's the person that lives on my street.  I was able to run this script over thousands of people without getting rate limited by Facebook.  Conceivably, I could run this nonstop and eventually build a giant database.

Why is This a Giant Problem?

Well, if you need me to tell you why this is a problem, then you're not thinking hard enough.  Here are just a few of the things we can leverage the above process for.

Profiling:

Creating spear phishing campaigns:

Create wordlists for password cracking.

Accurately predict when people are and aren't home based on check-ins.

How Can This Be Fixed?

If I had the ear of the state IT rep, I would start there.

I'd tell them that allowing anyone to download the entire voter database is probably a dumb idea.  I understand why voter records are public and it's for a good reason.  That said, we need to rethink how this is implemented.  The government just enabled me to build Google for criminal enterprises.  Facebook should also probably be rate limiting the above queries.

Currently, under certain conditions, I can script query forever without CAPTCHAs.

If you belong to either of the above mentioned parties and would like more detailed information, a POC demo, or my opinion on what to do to fix the issues, please feel free to reach out (reach out, not sue).

Return to $2600 Index