Читать книгу Security Engineering - Ross Anderson - Страница 182
5.7.1 Cryptography based on factoring
ОглавлениеThe prime numbers are the positive whole numbers with no proper divisors: the only numbers that divide a prime number are 1 and the number itself. By definition, 1 is not prime; so the primes are {2, 3, 5, 7, 11, …}. The fundamental theorem of arithmetic states that each natural number greater than 1 factors into prime numbers in a way that is unique up to the order of the factors. It is easy to find prime numbers and multiply them together to give a composite number, but much harder to resolve a composite number into its factors. And lots of smart people have tried really hard since we started using cryptography based on factoring. The largest composite product of two large random primes to have been factorized in 2020 was RSA-250, an 829-bit number (250 decimal digits). This took the equivalent of 2700 years’ work on a single 2.2GHz core; the previous record, RSA-240 in 2019, had taken the equivalent of 900 years [302]. It is possible for factoring to be done surreptitiously, perhaps using a botnet; in 2001, when the state of the art was factoring 512-bit numbers, such a challenge was set in Simon Singh's ‘Code Book’ and solved by five Swedish students using several hundred computers to which they had access [44]. As for 1024-bit numbers, I expect the NSA can factor them already, and I noted in the second edition that ‘an extrapolation of the history of factoring records suggests the first factorization will be published in 2018.’ Moore's law is slowing down, and we're two years late. Anyway, organisations that want keys to remain secure for many years are already using 2048-bit numbers at least.
The algorithm commonly used to do public-key encryption and digital signatures based on factoring is RSA, named after its inventors Ron Rivest, Adi Shamir and Len Adleman. It uses Fermat's little theorem, which states that for all primes not dividing , (mod ) (proof: take the set {1, 2, …, } and multiply each of them modulo by , then cancel out each side). For a general integer , (mod ) where Euler's function is the number of positive integers less than with which it has no divisor in common (the proof is similar). So if is the product of two primes then .
In RSA, the encryption key is a modulus which is hard to factor (take for two large randomly chosen primes and , say of 1024 bits each) plus a public exponent that has no common factors with either or . The private key is the factors and , which are kept secret. Where is the message and is the ciphertext, encryption is defined by
Decryption is the reverse operation:
Whoever knows the private key – the factors and of – can easily calculate . As and has no common factors with , the key's owner can find a number such that – she finds the value of separately modulo and , and combines the answers. (mod ) is now computed as (mod ), and decryption works because of Fermat's theorem:
Similarly, the owner of a private key can operate on a message with it to produce a signature
and this signature can be verified by raising it to the power mod (thus, using and as the public signature verification key) and checking that the message is recovered:
Neither RSA encryption nor signature is safe to use on its own. The reason is that, as encryption is an algebraic process, it preserves certain algebraic properties. For example, if we have a relation such as that holds among plaintexts, then the same relationship will hold among ciphertexts and signatures . This property is known as a multiplicative homomorphism; a homomorphism is a function that preserves some mathematical structure. The homomorphic nature of raw RSA means that it doesn't meet the random oracle model definitions of public key encryption or signature.
Another general problem with public-key encryption is that if the plaintexts are drawn from a small set, such as ‘attack’ or ‘retreat’, and the encryption process is deterministic (as RSA is), then an attacker might just precompute the possible ciphertexts and recognise them when they appear. With RSA, it's also dangerous to use a small exponent to encrypt the same message to multiple recipients, as this can lead to an algebraic attack. To stop the guessing attack, the low-exponent attack and attacks based on homomorphism, it's sensible to add in some randomness, and some redundancy, into a plaintext block before encrypting it. Every time we encrypt the same short message, say ‘attack’, we want to get a completely different ciphertext, and for these to be indistinguishable from each other as well as from the ciphertexts for ‘retreat’. And there are good ways and bad ways of doing this.
Crypto theoreticians have wrestled for decades to analyse all the things that can go wrong with asymmetric cryptography, and to find ways to tidy it up. Shafi Goldwasser and Silvio Micali came up with formal models of probabilistic encryption in which we add randomness to the encryption process, and semantic security, which we mentioned already; in this context it means that an attacker cannot get any information at all about a plaintext that was encrypted to a ciphertext , even if he is allowed to request the decryption of any other ciphertext not equal to [778]. In other words, we want the encryption to resist chosen-ciphertext attack as well as chosen-plaintext attack. There are a number of constructions that give semantic security, but they tend to be too ungainly for practical use.
The usual real-world solution is optimal asymmetric encryption padding (OAEP), where we concatenate the message with a random nonce , and use a hash function to combine them:
In effect, this is a two-round Feistel cipher that uses as its round function. The result, the combination , is then encrypted with RSA and sent. The recipient then computes as and recovers as [213]. This was eventually proven to be secure. There are a number of public-key cryptography standards; PKCS #1 describes OAEP [995]. These block a whole lot of attacks that were discovered in the 20th century and about which people have mostly forgotten, such as the fact that an opponent can detect if you encrypt the same message with two different RSA keys. In fact, one of the things we learned in the 1990s was that randomisation helps make crypto protocols more robust against all sorts of attacks, and not just the mathematical ones. Side-channel attacks and even physical probing of devices take a lot more work.
With signatures, things are slightly simpler. In general, it's often enough to just hash the message before applying the private key: (mod ); PKCS #7 describes simple mechanisms for signing a message digest [1010]. However, in some applications one might wish to include further data in the signature block, such as a timestamp, or some randomness to make side-channel attacks harder.
Many of the things that have gone wrong with real implementations have to do with side channels and error handling. One spectacular example was when Daniel Bleichenbacher found a way to break the RSA implementation in SSL v 3.0 by sending suitably chosen ciphertexts to the victim and observing any resulting error messages. If he could learn from the target whether a given , when decrypted as (mod ), corresponds to a PKCS #1 message, then he could use this to decrypt or sign messages [265]. There have been many more side-channel attacks on common public-key implementations, typically via measuring the precise time taken to decrypt. RSA is also mathematically fragile; you can break it using homomorphisms, or if you have the same ciphertext encrypted under too many different small keys, or if the message is too short, or if two messages are related by a known polynomial, or in several other edge cases. Errors in computation can also give a result that's correct modulo one factor of the modulus and wrong modulo the other, enabling the modulus to be factored; errors can be inserted tactically, by interfering with the crypto device, or strategically, for example by the chipmaker arranging for one particular value of a 64-bit multiply to be computed incorrectly. Yet other attacks have involved stack overflows, whether by sending the attack code in as keys, or as padding in poorly-implemented standards.