Open-Source Repository Abuse

by Terrible Doe

Open-Source Software (OSS) has been firmly established as a viable software development and licensing model.

Developers love the collaboration and the ability to reuse existing code while users appreciate free software that can compete with commercial applications.  Most people tend to see OSS as a complete win-win situation.

Unfortunately, from a security perspective this isn't always the case.  There have been several recent examples where open-source software (or more specifically, open-source software repositories) have been at the center of major security breaches.  Uber got into bother when a developer accidentally stored a sensitive database key on a publicly accessible GitHub page.  The iOS "goto fail" bug was discovered by a security researcher after Apple made the code publicly available.

Developers, whether intentionally or not, sometimes store things they shouldn't on public source code repositories.  Some developers believe that it's secure by default or that no one would be looking at their code.  Of course, the more people working in that repository, the harder it is to maintain control and the higher the likelihood that some sensitive information could be stored.  Even if specific, sensitive data isn't available from the repository, understanding the source code of any application can help in understanding how to attack it.

In this article, I will show what things can be found by digging around in source code repositories.  I'll show where to look and how to do the searches.  Finally, I'll cover how this information can be used by the intrepid hacker and how to secure it as a developer.

The most obvious source code repository, GitHub, can be a good starting point.

Many of the search strings provided later will return results from GitHub.  They are looking at improving the security of the site by implementing scrubbers to remove sensitive files.  If you find something on GitHub, copy it out or it may be removed the next time you look for it.  Other, dedicated repo hosting sites exist as well.  SourceForge, Bitbucket, and more can be found by performing a quick search.

Increasingly, tech companies are creating their own source code repositories.

Microsoft's CodePlex is a great resource for Windows OSS code.  Google has their own Google Code OSS project hosting service as well, but they plan to discontinue that in January 2016 (get at it while you can!).

Apart from these dedicated repositories, many open-source projects will host their own public repository.  Google's Chromium codebase (which Chrome and ChromeOS are derived from) has a publicly accessible repository, as do many other sponsored projects.  Smaller companies and individuals will often do the same.  Many individual software developers will make their repositories public as well (at times accidentally).

Using Google to find the hidden repositories is as simple as understanding how the repos are built.

Git, a popular repo, will usually end in: .git

A Google of filetype:git will give you about 1.4 million repositories (as of this writing).  Subversion, another popular repo, uses .svn files to store metadata about the source code.  Another Google search will help find those as well.

O.K., so now you know where to look.

What kinds of things can you expect to find in these repositories?  Pretty much anything!

You can find the private encryption keys for a user/application.  There may be information in the code comments, such as test user accounts (they tend to live forever) or the developer's notes on which lines of code are buggy (useful for writing exploits).  Configuration files often contain user credentials for the application to use for access (known as functional accounts) or may have URLs to other systems.  Since the repositories can version the code, digging into the history of it could reveal things that the developers had included, but then deleted, such as test data or proof-of-concept code.  There can be hard-coded information in the code files themselves (known as a magic number).

If the purpose of accessing the source code is to get a better understanding of how the application works, simply browsing through the accessible repository can be enough.

To get even more in-depth, you could load the codebase into your development environment and build it yourself.  This can tell you where the weak parts of the system may be and how it could be exploited.  By compiling it yourself, you can debug the code and step through it to see how various operations are performed.  However, if you're not a programmer, then you're probably just interested in what secrets you can find in the code.

Using standard Google advanced search operators (inurl, site, filetype, user) in various configurations will generally provide as much info as needed.  Here are some example search queries that will yield interesting results (the search string is after the "=").

Also, try changing the target site to other repo sites.  This is not a comprehensive list, but should give a good idea of what could be found:

By now, most of you are thinking about other things that you may be able to uncover.

As with all things, due care and discretion should be followed before diving in.

For example, Uber issued a subpoena to GitHub to force them to provide all of the IP addresses that accessed their secret key.

Be smart, be safe, and be informed.

Return to $2600 Index