Exploring Protocols - Part 1 [Archive] - RCE Messageboard's Regroupment

Matasano

December 2nd, 2007, 17:40

In the process of doing software security analysis, it is pretty common to encounter unknown network protocols or file formats that are part of the attack surface you’re investigating.
Not too long ago, we wrote a post entitled Reversing a ZLib-obfuscated? Network Protocol ("http://www.matasano.com/log/862/reversing-a-zlib-obfuscated-network-protocol/") where we talked about reversing an undocumented protocol to look for security weaknesses. We got several good questions about some of our deductions about the protocol as we picked it apart. I’d like to take the opportunity to talk more about protocol reversing in general and hopefully help explain how that deduction process works while getting some broader coverage on the subject in.

This will be the first of at least 2 blog posts. I’m going to start by discussing building blocks and see where that takes us. In the early phases of talking about this process, I’m not making a distinction between whether a protocol is “unknown” because of lack of documentation or because it’s simply “unknown to you/me” because we’re unfamiliar with it. Of course an undocumented protocol is going to be tricker to reverse. If there’s a point to these initial posts, it’s that working with documented protocols helps us understand the undocumented ones.

To illustrate some basic protocol dissection ideas, I’m going to talk about iSCSI. I mostly picked iSCSI since I happen to be working with it at the moment and it makes a pretty good case study.

In this post we’ll:
1. Talk a little bit about what iSCSI is and what it’s for.
2. Use Wireshark to find a iSCSI PDU and isolate it.
3. Compare the raw PDU to the specification.
4. Talk a bit about how this all relates to protocol reversing.

In a nutshell iSCSI is:

… SCSI over IP. It’s designed as a low cost solution for network attached storage. A storage server (say a NAS appliance) exports storage as “targets” on any TCP/IP network to which clients (aka “initiators”

connect. Once attached by connecting and logging on, the initiator’s OS sees the target as a hard drive and treats it as a block device. Filesystem drivers ride on top of the device as they would any other SCSI device. Besides file access, an initiator can arbitrarily partition and format the target using its allocated space.

Sounds a bit crazy from a security perspective, right? Well, just bear in mind that that iSCSI is not intended as a replacement for CIFS or NFS at all. iSCSI is first and foremost designed as an alternative to more expensive fiber channel NAS solutions by using cheaper gig-ethernet and possibly leveraging a company’s existing network infrastructure. The iSCSI spec is also apparently designed to be used over other transports besides TCP/IP.

We’re interested in what iSCSI looks like on the wire. This is not undocumented or new territory. Wireshark ("http://www.wireshark.org") has iSCSI decoding capabilities ("http://wiki.wireshark.org/iSCSI") way above and beyond the simple dissection tools we’re going to get into for iSCSI. We’re not going to use those decodes much for this discussion, though. Building our own tools gives us more intimate knowledge than relying on Wireshark will. We also want to have some building blocks for doing things later like fault injection if our exploration leads us that way.

iSCSI’s a good case study for protocol exploration since it’s not exactly a “common” network protocol, but has pretty decent documentation and specifications available in RFC’s. Picking it apart with some guidance helps illustrate some common network protocol concepts and we can double-check things against the actual specification to make sure we’re getting them right.

Here’s a hexdump of an isolated iSCSI PDU as it appears on the wire:

http://www.matasano.com/log/wp-content/uploads/2007/06/pdu-hd.png
I isolated this using Wireshark and saved it as a as a file to work with. iSCSI uses TCP/3260 as its transport. The pcap filter for this is “tcp port 3260″. Here’s how I did that:

http://www.matasano.com/log/wp-content/uploads/2007/06/get-pdu-ws.png

Now that we’ve isolated a sample, the next step is making sense out of the raw PDU. If this were an undocumented protocol, this would be the part where we opened it in a hex editor and started trying to separate chunks into boundaries based on educated guesswork, assisted by good conversion tools. Actually that’s just one way. Probably the most basic one.
This involves a lot of educated guesswork and is not always a straightforward process. We’re still talking about the guesswork, not doing it (yet).
Here’s the basic header syntax of an iSCSI PDU as defined in RFC 3720 ("http://www.ietf.org/rfc/rfc3720.txt") (yep there it is… we could stop now, but where’s the fun in that)

Code:

http://www.matasano.com/log/wp-content/uploads/2007/06/iscsi-bhs.png

This type of breakout basically represents how we’d like to be able to understand a network protocol. It’s very rare, even at best, that you’ll actually figure out what every field is for in an undocumented protocol. Just getting fields broken up so you can make sense out of most of them is what you’re usually going after initially. As you start to make sense of other things later, the things you may have originally passed over can gain context.

This RFC ("http://www.ietf.org/rfc/rfc3720.txt") explains the various fields pretty well and covers much more than just that. There’s more information in there than we are even likely to need. This raises a good point. Before you start “reversing” anything, always make sure it isn’t documented somewhere or implemented in something you can pull apart.
Using the spec to guide us, we’re going to try to understand this header and see what our captured PDU says. We’ll need to write a tool for this.

In the next post, we’ll:
1. Write a C dissector to emulate Wireshark decodes.
2. Write a Ruby dissector to approximate the C version.
3. Discuss some pros and cons of each.
4. Discuss some of the general things we can learn and how they can be applied to reversing truly unknown protocols.

http://www.matasano.com/log/885/exploring-protocols-part-1/