You're a member of the French Resistance in the height of WWII. You're part of a network of resistance members who have to work with other resistance members they've never met before. For instance, an agent from Paris might have to meet up with an agent in Normandy to work together on sabotage before the invasion. But resistance is not something the Nazis take lightly, and they've deployed double agents who attempt to infiltrate the resistance. The agent in Normandy needs some way to verify that this person from Paris is in fact a resistance member rather than an enemy double agent.
One solution would be the use of passwords. The resistance HQ distributes a list which contains the resistance agents' code names and their passwords. If the Paris agent (let's call her Alice) goes to meet the Normandy agent (let's call him Bob), Bob can say "If you're really Alice, tell me your password so I can verify it against my list." If she knows the password, this demonstrates to Bob that Alice is really Alice, since the Nazi double agents won't have been given passwords.
But this method is woefully insecure. If any resistance member gets captured, the Nazis have all the passwords and can can infiltrate the ranks of the resistance with impunity. The resistance needs a better authentication scheme, one that's not so vulnerable to capture of an agent's list. In short, the resistance needs a method of authenticating a password without knowing what the password is.
This can be done with a hash function. A hash function is designed in a way that's downright perverse to aficionados of calculus and continuous functions - it's designed so that its output varies wildly for inputs that vary only slightly. Let me give a contrived example of a hash function before describing how the resistance would use it.
Take an integer - one of at least several digits - and multiply it by itself twenty times. The result is going to be some really gargantuan number. Take the last 10 digits of that number. That's the output of our function, which we'll call h(n).
So if your password is 1234, the hash of your password is h(1234) = 9696306176. If your password is 1235, the hash of your password is h(1235) = 5244140625. It takes a little bit of math to get the hash, but it's not intractable. Notice that the hash function is one-way. There's no way to recover the password from the hash because you can't invert the operation "throw away all but the last 10 digits". A side effect of this is the fact that many different numbers could have the same hash, but in practice with 10^10 possible hashes a "collision" of two passwords having the same hash by accident is unlikely. (Although watch out for birthday attacks!)
The resistance can use this property of hash functions to make their resistance network more secure. Instead of distributing a list of all the agent's passwords, the resistance can distribute a list of the hashes of their passwords. Thus if Bob knows that Alice's hash is 7001140801, Alice can verify her identity by saying that her password is 314159, which has that as its hash. But if a Nazi double agent (let's call her Eve) has managed to steal the list of hashes, she still can't impersonate Alice. Eve doesn't know what password to use to generate that hash. She could try thousands or millions of guesses and hope that eventually she found one with a hash that matches Alice's hash, but with all the possible hashes that would be a herculean task.
I don't know if any resistance movements in the pre-computer era ever actually did anything like this. But it's an illustrative example of one of the fundamental concepts of modern computer security. When you select a password for a computer system, the computer doesn't store your password. It stores a hash of your password. When you input your password later on, the computer hashes your input and checks it against the stored hash. If for some reason the system is compromised in some other way, the attacker can't steal your password because it was never on the computer in the first place.
Now the hash functions used by modern computers are worlds more sophisticated than the toy example I gave above. They also give much more output - while my example had a 10-digit output, the outputs of modern hashes are often hundreds of digits long. This means an attacker has to search a much larger space of possibilities to find an input that gives a pre-selected hash.
The details of hashes and how they might be attacked quickly become pretty complicated, but are usually pretty interesting. It's an example of how some quite esoteric number theory can have a tremendously practical application. You can play around with hashes here if you'd like to see real hashes in action. Wikipedia as usual has a good writeup which is an excellent place to start if you're interested in learning more.
Is this the first Sunday Function that doesn't get a graph?
Excellent post, these 'Sunday Functions' are really interesting.
Bad hash function.
First, be sure your password does not end in zero...
The hash of any odd password is odd. The hash of any even passwords is divisble by 1024 (easy to show). Therefore, 1023 out of 1024 even numbers are not the hash of ANY password. So finding an inverse for an even hash, even by brute force, is 1000x less work than you think it is.
Similarly, all passwords divisible by 5 have hashes divisible by 5^10, which is nearly 10 million. And only one out of 10 million hashes divisible by 5 ever occur.
So, it is probably best to avoid passwords divisible by 2 or by 5.
Even then, I suspect this particular hash is extraordinarily weak. I bet someone well-versed in number theory could invert it in general without even using a computer. ("Invert" as in "find an inverse". Which is all you need to fool someone trying to authenticate you.)
Inventing strong hash functions is not simple.
Yep, it's a terrible terrible cryptographic hash. It's even bad for non-cryptographic purposes such as checksum-style applications. But it's about the simplest hash I could invent off the top of my head that would get the pedagogical point across.
Real hashes, as you point out, are very hard to do right. The current NIST competition ongoing to select a new cryptographic hash standard has already punched holes in some very good professionally-made hash functions. Thanks for punching the holes in mine - that's a good lesson in the perils of trying to do this sort of thing. It's why there's just so much bad crypto out there in general.
Now is this the first SF without a graph? Actually it might be. I probably should have graphed it for a few hundred sequential values to see if any obvious patterns stood out. If so, that would be another indication of weakness.
Well, I was also getting at this. When you say "with 10^10 possible hashes...", that is not true. Not all of the 10^10 numbers can occur as the actual hash of a password.
In fact, I believe at most 1/4 of them can occur. (A negligible fraction of the even numbers, plus at most 1/2 of the odd numbers. There just are not that many perfect squares modulo 10^10, never mind perfect 20th powers...) It might be even worse than that; I lack the number theory knowledge to be sure. But collisions are a great deal more likely than you would expect with a naÃ¯ve analysis.
I tried to do some research to actually break this hash. I got as far as "Hensel's Lemma" then gave up :-). But I still suspect it is possible to invert it even without a computer. Taking 20-th roots modulo 2^10*5^10 just cannot be all that hard...
And all because 10 is a composite number. So instead they do a modulus by a large prime number p, which is the same as throwing away all but the last digit in base p arithmetic.
Or you can use one of those wolfram fsms.
I believe at most 1/4 of them can occur
I can make this bound a little stricter with some mental arithmetic: 0.1 + ε. It may be possible to make this bound stronger by doing some more case checking, but I don't have time for that at the moment.
The situation you gave above is even worse than you described for multiples of 2 and 5. If you raise an even number to the 20th power, it must be divisible by 2^20 = 1048576. So all of the even integers map to not quite 10000 possible hashes. Multiples of 5 are even worse: 5^20 is about 9.5*10^13, so all numbers ending in 5 map to the same hash, and all numbers ending in 0 map to the same hash. As for odd numbers, the square of any odd number must end in 1, 5, or 9 (easy to verify via the binomial theorem). Since the 20th power is the square of a square, the hash of any odd number which is not divisible by 5 must end in 1.
Accessible failure opens a different door to security. Alice proffers a weak hash that does *not* map to her password. Any correct hash is then an imposter. Alas, this returns to keeping a broad secret.
"When you select a password for a computer system, the computer doesn't store your password. It stores a hash of your password. When you input your password later on, the computer hashes your input and checks it against the stored hash. If for some reason the system is compromised in some other way, the attacker can't steal your password because it was never on the computer in the first place."
Sadly, the word "doesn't" needs to be "shouldn't," particularly for websites. Easy test if it's probably stored securely or not: can the system email you your password if you forgot it? (If it can, it's either a very very poor hash and they're reversing it, or much much more likely, it's not stored hashed.)
Then there's the fact that passwords should never go in cleartext over the wire. Rather, should challenge/response with hashed passwords, but now we're getting even further afield.
Any chance you could walk us through a real hash function (even e.g. the classic unix one?) That'd be cool. :)
@Eric Lund --
I believe your analysis is flawed. It is true that raising an even number to the 20th power results in something divisible by 2^20, but then when you look at the last 10 digits, all you can be sure of is that those last 10 digits are divisible by 2^10 (as I wrote originally).
That said, I do agree my "1/4" bound is pretty weak. Just not for the reasons you give :-).
@Paul Murray --
Actually, using a prime as the modulus is even worse. Extracting n_th roots modulo a prime p is a well-known easy problem. (And quite trivial if the exponent is relatively prime to p-1.)
If you really want to use a strategy like this, you should probably have your modulus be the product of two primes that nobody knows. And just use the square of the password.
Pick two primes p and q.
Pick a password x.
Use x^2 (mod p*q) as your hash.
Of course, this requires that you pick a password with roughly as many digits as p*q.
This works because extracting square roots modulo n is provably equivalent to factoring n, and factoring is believed to be hard (cf. "Rabin cryptosystem").
Or you can use x^e (mod p*q) for some big number e. Then breaking the hash becomes equivalent to breaking RSA...
The number of different hashes is 25,390,690
Loved the post. Not going to argue about the hashes, even if I felt like I could.
If you want the added feature that Bob can't turn around and pretend to someone else that he is Alice, they could employ a zero knowledge protocol.