Hash Week! (Part 3)

Over the last two days we've talked about hash functions and their uses in cryptography and elsewhere. Remember that an ideal hash function is basically what cryptographers call a random oracle - given an input, it produces a random number in some range. (In practice this range is always [0,2^(2^n)], with n usually equal to 5 or 6 for non-cryptographic hashes or n equal to 7, 8, or 9 for cryptographic hashes.) This random number is deterministic, in that the same input always produces the same output. But the output is otherwise unpredictable. Given an output, it should not be possible to find a corresponding input except by brute-force calculation of every possible input. (And the other conditions we discussed yesterday.)

So how does a hash function actually work? One possibility might be just to add up the numbers in the input, divide by some chosen constant, and output the remainder as the hash value. In ASCII code, the letters of my first name, added together, give

"Matthew" = 77 + 97 + 116 + 116 + 104 + 101 + 119 = 730

If I want the hash to output a value between 0 and 32, I can divide by 32 and take the remainder as my hash. 32 goes into 730 a total of 22 times, with 26 left over. So the hash of my name with this very simple system is 26.

Some of the non-cryptographic hashes like CRC-32 are actually almost this simple. They're very useful for error-checking and the like, but they fail the "random oracle" criteria completely. It's easy to engineer a string with any given hash output.

The winner of NIST's competition to develop the new SHA-3 hash standard is an algorithm named Keccak. Internally it's pretty complicated, but conceptually it's simple. In cryptographic terms it follows the sponge construction, which describes the way that it "soaks up" the input and "squeezes out" the output. It works like this: Keccak breaks up the input into blocks of about a hundred bytes. It takes the first block and dumps it into the hash's internal memory. It scrambles the internal memory around with what amounts to a complicated shuffle. Then it takes the next block and combines it with the internal memory (via an XOR operation) and shuffles the internal memory again. Then it takes the next block, combines it with the internal memory, and shuffles again. This repeats until the hash has processed the entire input. When that happens, the output of the hash is just the internal state at the end.

That glosses over a lot of subtitles and completely ignore the details of the shuffle. I'm not going to try to explain those subtitles or those details because I don't understand them very well - I'm not at all a cryptographer, just an interested amateur. Still, those details are interesting and I encourage you to take a look at the Wikipedia article.


More like this

Last week NIST anounced the winner of its Cryptographic Hash Function Competition. After five years of review and many rounds of discussion and elimination, the winner is a hash function called Keccak, and its developers deserve many congratulations. It's a shame hash functions aren't better known…
Yesterday we looked at hash functions. As you recall, they're functions which take an input and generate a random-seeming output. As a quick example, here's the output of the SHA-256 hash function for the name of the Scottish physicist James Maxwell and a misspelling thereof: SHA256("James Clerk…
As promised, now we're going to look at the first major block cipher: the DES. DES stands for "data encryption standard"; DES was the first encryption system standardized by the US government for official use. It's an excellent example of a strong encryption system; to this day, while there are…
You're a member of the French Resistance in the height of WWII. You're part of a network of resistance members who have to work with other resistance members they've never met before. For instance, an agent from Paris might have to meet up with an agent in Normandy to work together on sabotage…