A tale of procrastination, the NSA, crypto-backdoors and symbolic execution Prologue Yours truly was meant to do paid work, porting come crypto protocol to browsers, this implied getting dirty with JavaScript, the insanity of fast changing and incompatible browser interfaces and other nasty beasts. Instead, yours truly remembered an exiting device he saw at a Crypto Museum exhibition. Behold the incredible PX1000cr: https://cryptomuseum.com/crypto/philips/px1000/img/301282/000/full.jpg This diabolical pocket telex (an antique peer-to-peer messaging thingy) from 1983 had a unique feature, it came with DES encryption and was marketed at small companies and journalists. According to some rumors even the dutch government used some. This freaked out the the NSA, which sent emissary to buy all the stock from the market and pressured Philips to suspend sales of any such infernal devices. In '84 the NSA provided Philips with an alternative encryption algorithm, which they were happy with to be available on the market. The astute reader - being knowledgeable about the NSA's backdooring efforts - might immediately suspect that the new firmware might be "weird" in some ways. Yours truly certainly suspected mischief. ACT I - Exposition Luckily the fine people of the Crypto Museum, have not only dedicated a couple of pages to this device, they also published ROM dumps of the original DES-enabled and the agency-tainted device. Thus started my journey. The fine people of the Crypto Museum published also the bachelors thesis of a student reverse engineering this ROM himself, although Ben - the student - focused on the DES variant. http://www.cs.ru.nl/bachelorscripties/2014/Ben_Brucker___0413291___Government_intervention_on_consumer_crypto_hardware.pdf Although the thesis did not contain much source code, it was enough to have a head start that allowed me to dive directly into the encryption code. The first steps were a mistake. I could not resist the irony of using an Agency tool to break an Agency backdoor, hence I loaded the ROM into ghidra and started annotating memory addresses. After a spending hours on this I tried to disassemble the code, and only then I found out that ghidra does not support this particular CPU. Speaking of which, the CPU in question is a Hitachi HD6303 a derivative of a Motorola 6503, a very simple but for its time powerful 8-bit processor. It only has four 16 bit registers: a stack pointer, an instruction pointer, an index register to address memory, and an accumulator. The latter can be accessed as a 16 bit register or as two 8 bit registers. The instruction set is simple, but certainly capable of doing great things. Turns out the same CPU was also used in the venerable PSION II Personal Digital Assistant. This lucky coincidence means there are fans of this device, who document for example the instruction set: https://www.jaapsch.net/psion/mcmnemal.htm The fine people of the Crypto Museum also published an undated photocopied and scanned version of the CPU datasheet: https://cryptomuseum.com/crypto/philips/px1000/files/hd6303rp.pdf It took me a few days to first realize, that some pages are missing, to realizing that only the even numbered pages are missing, to realizing that the even-numbered pages start half-way into the document in decreasing order. All hints that the pages have been scanned while holding discordian principles high. Hail Eris, indeed! Having all this supporting information available, made it an easy effort to go from binary dump to an equivalent algorithm written in C. However there were some weird things. Like for example the encryption function starts with decreasing the pointer to the plaintext by one. Why? And where does that that preceding byte come from, and will people be offended if I index a C char array with -1? Questions which meant I had to either reverse engineer other parts of the ROM that were unrelated to the cryptographic algorithm, which I am too lazy to do. Or I had to find another way. Turns out the Psion II fans also had an emulator: https://github.com/dg1yfe/sim68xx One of the most active contributors to this fine piece of software is someone with a Hungarian name. Yours truly having lived there himself, and having founded a hackerspace there started to strongly suspsect that this particular Hungarian contributor must be known to me. I first suspected someone else, but that person put me on the right track, and it turned out indeed this contributor is a regular in these circles. After a friendly chat on IRC he also became interested in the ROM dumps, but - just like Ben the student- he focused on the DES version. I complained about this plaintext array that is indexed by -1, and that I had to rather get dirty with JavaScript and browsers instead of reverse engineering the whole of the ROM to figure out what on that pesky -1 index resides. I postulated it would be easy to figure out if the emulator would indeed also emulate the keyboard and the display. One day later: https://github.com/iddq/sim68xx/tree/px1000 There was it, an emulator with the display and keyboard. It turned out it is possible to set the text-width - not sure why this makes sense, but it is possible. The -1st character is indeed encoding the text-width, which is limited by the size of the display to 40. It also turned out that the plaintext to be encrypted is also post-fixed with another character: 0x8d. A peculiar detail: on this system, since it's 7-bit ASCII only the eighth bit is reserved to signal the end of the string. Thus 0x8d encodes both a newline character and the EOS. With the working emulator I was able to verify my C interpretation of the encryption algorithm, and thus I could start into phase 2, breaking the crypto. Dramatis Persoanae The algorithm itself can be shown in a simplified block diagram: https://cryptomuseum.com/crypto/philips/px1000/svg/px1000_nsa_flow.svg The mysterious key Remember this device is a 7-bit ASCII input device, how can someone enter an encryption key without much hassle? The engineers came up with a nice idea, take an arbitrary 16 byte string, zero out the top nibble of each byte, and only use the lower (and slightly higher entropy) nibble, providing with a 64-bit key, which is stronger than the measly 56 bit key of DES. Let's introduce our other main characters. In the schema, on the top left, the 16 byte block denoted 'L' is supposed to be a set of four linear feedback shift registers. This is the bad guy, the end level boss, he is elusive and changes like a chameleon. To the right we have two blue blocks - denoted Va and Vb - of four bytes each which contain some transformation of the encryption key. This is a supporting character, mostly stays in the background, has little character development. Right of Va we have the 4 byte C block, which is a FIFO initially containing a transformation of parts of the encryption key, but it later becomes a cipher-feedback buffer containing the last 4 bytes of ciphertext. Another supporting character, this guy looks strange in the beginning, but later on becomes a familiar face we know and recognize. The block denoted by P is really just a transformation which replaces each 4 bit nibble with another 4 bit nibble based on a lookup-table. This young lady is the sister of F, but is mostly staying predictable. The big yellow block F in the middle is eight non-linear transforms that converts 6 input bits into one output bit, more on this later. This lady is another trouble maker she's the femme fatale of this play, she's working with the evil guy, making things difficult. And last the small block K is a transformation of the keystream byte, that rotates the keystream byte left by the number of byte being currently encrypted modulo 8. Just another supporting character without much depth. Act I It is very important to see how these blocks are initialized, this is the part where the alarm bells start getting louder. During initialization one operation comes up everywhere: the low nibble gets complemented and set as the high nibble. unsigned char invertLoNibble2Hi(unsigned char x) { return ((~x) << 4) | x; } In fact this is how 15 of the 16 bytes of the LFSR are initialized, each low nibble of the key is taken and inflated into a byte. The last byte is set to 0xff. As code: for(i=0;i<15;i++) { lfsr[i] = invertLoNibble2Hi(key[i]); } lfsr[15]=0xff; Now, if you happen to know somehow the internal state of the LFSR and know how to reverse it, then it becomes trivial to check if any state has the special structure of the initial state from which the key can be trivially recovered. Not sure if that actually helps, but it's ugly anyway. Blocks Va, Vb and C are similarly initialized: for(i=0;i<4;i++) { V[i] = invertLoNibble2Hi(key[i] ^ key[i+4]); V[i+4] = invertLoNibble2Hi(key[i+8] ^ key[i+12]); C[i] = V[i] ^ V[i+4] ^ 0xf0; } At first it's not obvious, but if you expand V[i] and V[i+4] when setting C[i] and you do the math, you will come to the conclusion that the values of C can only be of these 16 values: {0x0f, 0x1e, 0x2d, 0x3c, 0x4b, 0x5a, 0x69, 0x78, 0x87, 0x96, 0xa5, 0xb4, 0xc3, 0xd2, 0xe1, 0xf0} this can be easily verified by running this python code import iterools def ilth(x): return (x^0xf) << 4 | x sorted([f"{x:02x}" for x in {ilth(a)^ilth(b)^0xf0 for a,b in itertools.product(range(16),range(16))}]) My alarm bells are kinda deafening by now, how's yours? After the initialization, the stream cipher is ready to be used. For each key-stream byte the LFSR is mutated, then combined with the V and C blocks fed into the F function and then xored into the plaintext. Let's have a look first at the mutation of the LFSR block: for(round_counter = 0x1f;round_counter>=0;round_counter--) { acc = 0; for(i=0;i<16;i++) { // FAC7 in the code this loop is unrolled acc ^= lfsr[i] & lookupTable[(round_counter+i)%16]; } // FB43 acc = ((acc >> 1) ^ acc) & 0x55; // FB45..FB4A // (round_counter ^ 0xff) & 0xf == is twice the sequence 15..0 tmp=(round_counter ^ 0xff) & 0xf; lfsr[tmp] = ((lfsr[tmp] << 1) & 0xAA) | acc; } // FB63 Doesn't really look like a traditional - or set of - LFSR to me. But if the Crypto Museum people say so, I'm going with their insights. Nota bene: Those 16-bit hex numbers in the comments mark the addresses, where in the ROM this code can be found. Normally an LFSR emits a bit after each advancement. In this code this is not obvious how it is done. The following snippet shows how 4 bytes are being extracted from the LFSR after it has been mutated: for(i=0;i<4;i++) { tmp = lfsr[i+7]; // FB68..FB6C tmp = (tmp << 2) | (tmp >> 6); // 2x rotate left FB6E..FB72 lfsr_out[i] = tmp ^ lfsr[i]; // FB74 .. FB7A } If you squint you might be imagining four LFSRs there, but as you will see, for our final attack this doesn't matter much. This concludes the left side of the schema before being fed into the non-linear function F. On the right side of the schema you can see how Va and the ciphertext FIFO are being xored and mapped through P, in code this looks as such: for(i=0;i<4;i++) { tmp = V[i] ^ CiphertextFifo[i]; acc = map4to4bit[i][tmp >> 4] << 4; acc |= map4to4bit[i][tmp & 0xf]; pbuf[i] = acc ^ V[i+4]; } Looks straightforward, but if we unpack this in the context of encrypting the very first character (which is probably '(' but this is irrelevant here), then we can unpack: tmp = V[0] ^ CiphertextFifo[0] where CiphertextFifo[0] = V[0] ^ V[4] ^ 0xf0 which drops out V[0] and thus: tmp == V[4] ^ 0xf0 and we know that all values of V are values where the high nibble is just the inversion of the low nibble, and if we xor that with 0xf0, we get that tmp can only be one of these 16 values: {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} Strange, huh? This loop runs 4 times, with similar results for each output byte, just saying. Later when the Ciphertext FIFO is filled with real ciphertext this doesn't apply anymore, but then the contents of this buffer are known - since its the ciphertext, huh. The mapping itself of 4 bits to other 4 bits was relatively uninteresting, at least I couldn't immediately see anything wrong with it. And also xoring that with Vb was also much less exciting. Now that we have the inputs to the F function, we can analyze what happens there. The code is a bit dense, we'll unpack it later: for(i=8, acc=0;i>0;i--) { // FBB9 for(j=1,tmp=0;j<4;j++) { tmp = (tmp << 1) | (lfsr_out[j] >> 7); lfsr_out[j]<<=1; tmp = (tmp << 1) | (pbuf[j] >> 7); pbuf[j]<<=1; } tmp=lookupTable6To1bit[tmp]; acc=(acc<<1) + ((tmp>>(i-1)) & 1); } // 0xfbd9 The outer loop takes care that all 8 bits of each input of the 6 input bytes get used in F() and that the output of F() is being assembled back into one byte. The inner loop interleaves the 6 input bits from lfsr[1], pbuf[1], lfsr[2], pbuf[2], lfsr[3] and finally pbuf[3]. The lookup table produces one bit, which in the last line is put into the correct bit-position of the accumulator. It's a pretty straightforward bit-sliced 6-byte-to-1-byte mapping. The lookup table is neat, it's 64 bytes, which is indexed by the 6 bit interleaved value, and from the resulting byte the i-th bit is extracted. Very compact, neat. The next steps are unspectacular (keeping in mind that curChar starts with -1): acc ^= pbuf[0] ^ lfsr_out[0]; // FBDF tmp = (curChar + 1) & 7; acc = (acc << tmp) | (acc >> (8-tmp)); // rotate left by tmp ciphertext[curChar] = plaintext[curChar] ^ acc Note that for decryption only this last line needs to swapped be around. One last step is needed before we can loop back to mutating the LFSR, and that is advancing the ciphertext FIFO, now that there is a ciphertext byte. Again this is pretty straightforward, and after 4 ciphertext bytes the peculiar structure noted above of the initial 4 bytes in this FIFO is lost: // FC05 CiphertextFifo[4] = ciphertext[curChar]; for(i=0;i<4;i++) { // rot left CiphertextFifo[i] = (CiphertextFifo[i+1] << 1) | (CiphertextFifo[i+1] >> 7); } // FC15 A small optimization is that the array holding the FIFO is actually 5 bytes, and the newest ciphertext byte is always added to the 5 position, which enables this compact loop updating the 4 effective items in this FIFO. If there is more plaintext bytes to encrypt, then the algorithm loops back to mutating the LFSR, otherwise everything's done. ACT II - Climax So this all looks kind of fishy, but how do one actually break this scheme? Well for a long time I focused on somehow figuring out the LFSR and how it can be decomposed in 4 LFSRs of 32, 31, 29, and 27 bit length as indicated on the Crypto Museum schema. Many hours were wasted into slicing and dicing the LFSR, mutating it, slicing and dicing it again, writing bit level differs, staring at colored bits, throwing Berlekamp-Massey at it, trying to write my own 32/31/29/27 bit LFSRs and seeing if I can somehow slice-n-dice a state from the big one into the ones I implemented. It was a nightmare of dead ends, failure, despair. Boredom started to set in, I started to ask friends, maybe they can figure out how this works. They said it's easy, but they have no time now for this. Anyway, maybe this is an LFSR or even four, but I was unable to figure out how. I also started to consult the bible of cryptanalysis, Antoine Joux' masterpiece: Algorithmic Cryptanalysis. It has a chapter: "Attacks on stream ciphers" and this chapter is about LFSRs hidden behind a non-linear function F, Antoine calls these filtered-generators: > The filtered generator tries to hide the linearity of a core LFSR by > using a complicated non-linear output function on a few bits. At > each step, the output function takes as input t bits from the inner > state of the LFSR. These bits are usually neither consecutive, nor > evenly spaced within the register. Bingo! Exactly what I'm supposedly staring at for days now, the big guy and the femme fatale. The chapter mostly covers correlation attacks, but at the end there is also mention of algebraic attacks, the latter gives me a warm fuzzy feeling. Algebra is elementary school stuff, I can do that! Antoine goes on: > The function f is usually described either as a table of values or > as a polynomial. Note that, using the techniques of Section 9.2, f > can always be expressed as a multivariate polynomial over F2. The technique in section 9.2 is is called the Möbius transform which is used to calculate the ANF - the algebraic normal form - of a boolean function. I tried to implement the Möbius transform as given in algorithm 9.6 in Joux' masterpiece, but the results were not providing the expected outputs as the lookup table. After reading a bunch of papers on algebraic normal forms, I learned that different disciplines call this different names, such as: - ANF Transform (ANFT), - fast Möbius (or Moebius) Transform, - Zhegalkin Transform, - Positive Polarity Reed–Muller Transform (PPRMT) Valentin Bakoevs excellent paper "Fast Bitwise Implementation Of The Algebraic Normal Form Transform"" went into much more detail than Joux on this topic, and an implementation of his Algorithm 1, gave the expected results. void moebius(uint8_t *f, int n) { int blocksize=1; int step; for(step=1; step<=n;step++) { int source=0; while(source < (1<