The Best Way to Share a Treasure Map

by Mike

Recently, I spent a fair amount of time researching Forward Error Correcting codes, also known as FEC.

FEC are a class of algorithms and math constructs that allow you to reconstruct a damaged message in a one-way communication channel that doesn't allow you to ask the sender to send it again.  This is opposed to a two-way communication channel, such as TCP/IP, that allows you to detect an error and ask for a resend.

These codes are actually quite prevalent in modern technology.

They make CDs, DVDs, and other optical data storage methods work.  They are integral in interplanetary communications due to the extreme time lags.  You also see one every time you look at a QR code.  Those pixelated barcodes are designed to still be readable even if there is significant damage to the original code.  It is for an optical data encoding method much like a QR code that I started investigating FEC.

Along the way I discovered there are two very distinct classes of forward error codes.  There are forward error correction codes and there are forward erasure correction codes.

A forward error correction code can take a message of a given length, detect mangled bytes, and replace them to reconstruct the original message.  A forward erasure correction code will break a message into a number of blocks and ensure that if any combination of a given number of those blocks make it to the receiver undamaged, the original message can be reconstructed.  Both of them work by appending a set of parity bytes to the original data.

Both of these rather amazing algorithms are dependent on a class of finite field mathematics called Évariste Galois (pronounced "Galwah" ... he was French) fields.

Galois invented this math before he died at 21 years of age from injuries sustained in a duel.  Personally - though I am much older and highly unlikely to engage in a duel - I found myself more than a little flummoxed by the complexity of this math, despite the more than a few tutorials available on this specific topic.  But I still need to use them for my project.

I've got a communication channel that can reliably send data, one-way, with nearly 99 percent reliability.  But those errors express themselves as random, unpredictable bit flips.  The correct number of bits will arrive, but they might not be right.  This is actually a perfect application for forward error correction and I began my hunt for an available library accordingly.

Long story short, there aren't any good open-source forward error correction libraries.

There are some really promising contenders, but I could not get them to work and, due to my limited understanding of the underlying math, I could not fix them.

But, there are quite a few very excellent forward erasure correction libraries out there.  After some experiments, I settled on zfec, which has a native Python library with the underlying C readily available.

You can get zfec at: github.com/tahoe-lafs/zfec or your Linux distribution may have a package available.  I had better results with the package on python.org than I did with the native Ubuntu package.

Once you have zfec installed you can issue a command such as:

$ zfec -m 5 -k 3 myfile.txt

This will break your document into five files (called "shares" in zfec parlance), any three of which may be recombined by the command:

$ zunfec -o myfile.txt myfile.txt.0_5.fec myfile.txt.2_5.fec  myfile.txt.4_5.fec

This will yield your original file again, even though two of the pieces are missing (myfile.txt.1_5.fec and myfile.txt.3_5.fec).

Although it was completely unrelated to my original project, I got to thinking.

What a wonderful way to share some critical file with your friends, but prevent any one of them from accessing the file on their own.

In the case above, if someone wants to read myfile.txt, they have to obtain at least three of the portions to do so.  It is a mathematical way to enforce collaboration (or, if you are a CISSP, "collusion").

So, say you were with a crew of pirates - like "Arrrrgh!" pirates, not the political party - and you buried some treasure.

You want to share the treasure map out, but you don't want any one pirate to be able to find it on their own.  If you had nine pirates, and you knew they socially formed three distinct groups of three pirates each, you could protect your map by using:

$ zfec -m 9 -k 7 treasuremap.jpg

Then, assuming the pirates were able to keep their individual files safe, no individual or single group of pirates could recreate the treasure map on their own.  They would have to come to an agreement across all of the groups to go after the treasure.

Why not just split it into nine files and make all nine required?  Because, then, one person could prevent anyone from getting to the treasure.  That would be a tyranny of the minority.

It's parliamentary politics enforced by 19th century math!

But there is a problem.

Inside of each share there is a portion of the file that is unmodified by zfec.  If the original file was ASCII text, you would be able to read a portion of that file in each share and that would be unacceptable.  The solution I came up with was to encrypt the file before splitting it.

Using AES-256 in Cipher Block Chaining mode means each block cannot be decrypted without knowledge of what the previous block was, even if you know the encryption key.

So, I AES-256-encrypt the file, store the initialization vector at the front of the file and the randomly-generated key at the back of the file, and zfec-split the encrypted file.

The owner of the first share will not be able to decrypt her block because she doesn't have the key.  The owner of the last share can't decrypt his block, even though he has the key, because he doesn't have the initialization vector nor the ciphertext in the previous share.

I became quite taken with this idea.

Seems like a great way to disperse a copy of your will to your beneficiaries.  They don't know what it says, but they know they have to come together to hear it!  There might be other great applications of this technique.

To make this easy, I wrote a Python script called "tmap.py", short for Treasure Map.

The code is included below.

You need to install the zfec package and the PyCrypto package in order to use tmap.py.

Both of these are readily available in the Python package library.

Treasure Map will encrypt and split a file into your designated number of shares, storing the necessary metadata to recreate and decrypt the file in each block.  Treasure Map will then recreate a file when the proper number of shares are found.

To encrypt and split a file (assuming you have made tmap.py executable):

m = The number of total shares.
k = Minimum number required to recover the file.

$ ./tmap.py --make m k filename.ext

For instance:

$ ./tmap.py --make 3 2 rickroll.mp4

will create three files named: rickroll.mp4-1.tmap, rickroll.mp4-2.tmap, rickroll.mp4-3.tmap.

You keep one for yourself and give the other two to your friends.

Friend #2 comes over and really wants to see Rick, so he gives you a USB stick with his share on it.  You then use:

$ ./tmap.py --recover rickroll.mp4-1.tmap rickroll.mp4-3.tmap

Treasure Map will recreate rickroll.mp4 and you can sit back and enjoy the show.

I tried to keep the code for tmap.py as short as possible to assist print publication, so there is a lot of error checking missing.  The code will also overwrite files willfully, so it is best to make and recover in a fresh folder.

Now if I could only find a math concept that would force Congress to compromise and legislate!

Code follows:

#!/usr/bin/python

import zfec
from Crypto.Cipher import AES
from Crypto import Random
import sys

def tmapzfecsplit(data, m, k):
    splitter = zfec.easyfec.Encoder(k, m)
    splitdata = splitter.encode(data)
    # calc pad
    padlen = len(splitdata[0])*k - len(data)
    # prepend k,m sharenum, and padlen to each block - can't recreate without it
    splitdata = [str(k)+","+str(m)+","+str(snum)+","+str(padlen)+":"+sdata for snum, sdata in enumerate(splitdata)]
    return splitdata

def tmaprecreate(datablocks):
    sharenums = []
    cleanblocks = []
    k = 0
    m = 0
    padlen = 0
    for block in datablocks:
        splitdata = block.split(":", 1)
        cleanblocks.append(splitdata[1])
        metadata = splitdata[0].split(",")
        k = int(metadata[0])
        m = int(metadata[1])
        sharenums.append(int(metadata[2]))
        padlen = int(metadata[3])

    if len(cleanblocks) < k:
        # not enough blocks provided
        raise Exception("Can't recreate file. Need {} parts, only have {}.".format(k, len(cleanblocks)))

    decoder = zfec.easyfec.Decoder(k,m)
    origdata = decoder.decode(cleanblocks, sharenums, padlen)
    return origdata

def tmap_encryptdata(data):
    # AES-256 encrypts the data, pre-pending the IV and appending the key
    # the IV is the stringified padding value to get to 16 byte blocksize
    # 32 byte key is random
    padlen = 16 - (len(data) % 16)
    iv = "0" * (16- len(str(padlen))) + str(padlen)
    data = data + "0" * padlen
    key = Random.new().read(32)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    encryptedData = iv + cipher.encrypt(data) + key
    return encryptedData

def tmap_decryptdata(encryptedData):
    # AES-256 decrypts data
    # expects first 16 bytes is IV (and padlen)
    # and last 32 is the key
    iv = encryptedData[:16]
    key = encryptedData[-32:]
    cipher = AES.new(key, AES.MODE_CBC, iv)
    msg = encryptedData[16:-32]
    data = cipher.decrypt(msg)
    padlen = int(iv)
    data = data[:-padlen]
    return data

def tmap_split(filename, m, k):
    # encrypts data in filename
    # splits the data into m parts, k of which can recreate the file
    # prepends the filename to each file
    # saves as filename-X.tmap
    f = open(filename)
    data = f.read()
    f.close()
    enc_data = tmap_encryptdata(data)
    datablocks = tmapzfecsplit(enc_data, m, k)

    for filenum, data in enumerate(datablocks):
        f = open("{}-{}.tmap".format(filename, filenum+1), "wb")
        data = filename+":"+data
        f.write(data)
        f.close()

    return

def tmap_rebuild(filenames):
    # rebuilds a tmap file from the parts in filenames
    savefilename = ""
    datablocks = []
    for name in filenames:
        fr = open(name, "rb")
        data_raw = fr.read()
        fr.close()
        data_split = data_raw.split(":", 1)
        if savefilename == "":
            savefilename = data_split[0]
        elif not savefilename == data_split[0]:
            raise Exception("Not all data is from the same file. Cannot recreate")
        datablocks.append(data_split[1])

    enc_data = tmaprecreate(datablocks)
    data = tmap_decryptdata(enc_data)
    f = open(savefilename, "wb")
    f.write(data)

def tmap_cmdline(args):
    # not using getopt to save code space, but harder to detect errors
    help_text = 'Usage:\ntmap --make m k filename\n\tSplit filename into m parts, k of which are needed to recover\n'
    help_text = help_text + 'tmap --recover file1 file2 file3 ... filen\n\tAttempt to recover the file from the listed parts.'
    help_text = help_text + '\nArguments must be in order!!'

    if len(args) < 3:
        print(help_text)
        sys.exit(2)

    try:
        if args[1] == "--make":
            m = int(args[2])
            k = int(args[3])
            tmap_split(args[4], m, k)
            print("{} split into {} parts, {} needed to recover.".format(args[4], m, k))
        elif args[1] == "--recover":
            tmap_rebuild(args[2:])
            print("File recovered!")
        else:
            print(help_text)
    except Exception as e:
        print("Failed to work:")
        print(e.args[0])
        sys.exit(2)

tmap_cmdline(sys.argv)

Code: tmap.py

Return to $2600 Index