Читать книгу Security Engineering - Ross Anderson - Страница 154
5.3.1.2 The birthday theorem
ОглавлениеThe birthday theorem gets its name from the following problem. A maths teacher asks a class of 30 pupils what they think is the probability that two of them have the same birthday. Most pupils intuitively think it's unlikely, and the maths teacher then asks the pupils to state their birthdays one after another. The odds of a match exceed 50% once 23 pupils have been called. As this surprises most people, it's also known as the ‘birthday paradox’.
The birthday theorem was first used in the 1930’s to count fish, so it's also known as capture-recapture statistics [1668]. Suppose there are fish in a lake and you catch of them, ring them and throw them back, then when you first catch a fish you've ringed already, should be ‘about’ the square root of . The intuitive reason why this holds is that once you have samples, each could potentially match any of the others, so the number of possible matches is about x or , which is what you need3.
This theorem has many applications for the security engineer. For example, if we have a biometric system that can authenticate a person's claim to identity with a probability of only one in a million that two randomly selected subjects will be falsely identified as the same person, this doesn't mean that we can use it as a reliable means of identification in a university with a user population of twenty thousand staff and students. This is because there will be almost two hundred million possible pairs. In fact, you expect to find the first collision – the first pair of people who can be mistaken for each other by the system – once you have somewhat over a thousand people enrolled. It may well, however, be OK to use it to verify a claimed identity (though many other things can go wrong; see the chapter on Biometrics in Part 2 for a discussion).
There are some applications where collision-search attacks aren't a problem, such as in challenge-response protocols where an attacker has to find the answer to the challenge just issued, and where you can prevent challenges repeating. In identify-friend-or-foe (IFF) systems, for example, common equipment has a response length of 48 to 80 bits. You can't afford much more than that, as it costs radar accuracy.
But there are other applications in which collisions are unacceptable. When we design digital signature systems, we typically pass the message through a cryptographic hash function first, and then sign the hash , for a number of reasons we'll discuss later. In such an application, if it were possible to find collisions with but , then a Mafia owned bookstore's web site might precalculate suitable pairs , get you to sign an saying something like “I hereby order a copy of Rubber Fetish volume 7 for $32.95” and then present the signature together with an saying something like “I hereby mortgage my house for $75,000 and please send the funds to Mafia Holdings Inc., Bermuda.”
For this reason, hash functions used with digital signature schemes have large enough to make them collision-free. Historically, the two most common hash functions have been MD5, which has a 128-bit output and will thus require at most computations to break, and SHA1 with a 160-bit output and a work factor for the cryptanalyst of at most . However, collision search gives at best an upper bound on the strength of a hash function, and both these particular functions have turned out to be disappointing, with cryptanalytic attacks that I'll describe later in section 5.6.2.
To sum up: if you need a cryptographic hash function to be collision resistant, then you'd better choose a function with an output of at least 256 bits, such as SHA-2 or SHA-3. However if you only need to be sure that nobody will find a second preimage for an existing, externally given hash, then you can perhaps make do with less.