Читать книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao - Страница 32
1.3 System Security
ОглавлениеFor a system to be trustworthy, it must be both dependable and secure. Traditionally, dependable computing and secure computing have been studied by two disjoint communities [2]. Only relatively recently, the two communities started to collaborate and exchange ideas, as evidenced by the creation of a new IEEE Transactions on Dependable and Secure Computing in 2004. Traditionally, security means the protection of assets [7]. When the system is the asset to be protected, it includes several major components as shown in Figure 1.4:
◾ Operation. A system is dynamic in that it is continuously processing messages and changing its state. The code as well as the execution environment must be protected from malicious attacks, such as the buffer-overflow attacks.
◾ System state. The system state refers to that in the memory, and it should not be corrupted due to failures or attacks.
◾ Persistent state. System state could be lost if the process crashes and if the process is terminated. Many applications would use files or database systems to store critical system state into stable storage.
◾ Message. In a distributed system, different processes communicate with each other via messages. During transit, especial when over the public Internet, the message might be corrupted. An adversary might also inject fake messages to the system. A corrupted message or an injected message must be rejected.
Figure 1.4 Main types of assets in a distributed system.
When we say a system is secure, we are expecting that the system exhibits three attributes regarding how its assets are protected [2]: (1) confidentiality, (2) integrity, and (3) availability. Confidentiality refers to the assurance that the system never reveals sensitive information (system state or persistent state) to unauthorized users. The integrity means that the assets are intact, and any unauthorized modification to the assets, be it the code, virtual memory, state, or message, can be detected. Furthermore, messages must be authenticated prior to being accepted, which would prevent fake messages from being injected by adversaries. The interpretation of availability in the security context is quite different from that in the dependable computing context. Availability here means that the asset is accessible to authorized users. For example, if someone encrypted some data, but lost the security key for decryption, the system is not secure because the data would no longer be available for anyone to access. When combining with dependable computing and in the system context, availability is morphing into that defined by the dependable computing community, that is, the system might be up and running, and running correctly so that an authorized user could access any asset at any time.
An important tool to implement system security is cryptography [11]. Put simply, cryptography is the art of designing ciphers, which scrambles a plaintext in such a way that its meaning is no longer obvious (i.e., the encryption process) and retrieves the plaintext back when needed (i.e., the decryption process). The encrypted text is often called cipher text. Encryption is the most powerful way of ensuring confidentiality and it is also the foundation for protecting the integrity of the system. There are two types of encryption algorithms, one is called symmetric encryption, where the same security key is used for encryption and decryption (similar to household locks where the same key is used to lock and unlock), and the other one is called asymmetric encryption, where one key is used to encrypt and a different key is used to decrypt. For symmetric encryption, key distribution is a challenge in a networked system because the same key is needed to do both encryption and decryption. The asymmetric encryption offers the possibility of making the encryption key available to anyone who wishes to send an encrypted message to the owner, as long as the corresponding decryption key is properly protected. Indeed, asymmetric encryption provides the foundation for key distribution. The encryption key is also called the public key because it can be made publicly available without endangering the system security, and the decryption key is called the private key because it must remain private, i.e., the loss of the private key will cripple the security of the entire system if built on top of asymmetric encryption. To further enhance the security of key distribution, a public-key infrastructure is established so that the ownership of the public key can be assured by the infrastructure.
Symmetric encryption is based on two basic operations: substitution and transposition. Substitution replaces each symbol in the plaintext by some other symbol aiming at disguising the original message, while transposition alters the positions of the symbols in the plaintext. The former still preserves the order of the symbols in the plaintext, while the latter produces a permutation of the original plaintext and hence would break any established patterns of the symbols. The two basic operations are complementary to each and would make the encryption stronger if used together. This also dictates that the symmetric encryption is going to work on a block of plaintext at a time, which is often referred to as block ciphers. When encrypting a large amount of plaintext using block ciphers, they must be divided into multiple blocks. A naive way of doing encryption would be to encrypt each block separately. Although the encryption can be done in parallel and hence can be quickly done, doing so like this would create a problem: an adversary can reorder some of the cipher texts so that the meaning is completely altered, and the receiver would have no means to detect this! To mitigate this problem, various cipher modes were introduced, such as the cipher block chaining mode and the cipher feedback mode. The essence of the cipher modes is to chain consecutive blocks together when encrypting them. As a result, any alteration of the relative ordering of the cipher texts would break the decryption.
However, encryption alone is not sufficient to build a secure system. We still need mechanisms for authentication, authorization, and for ensuring non-repudiation, among many other requirements. Highly important cryptographic constructs include crypto-graphic hash functions (also referred to as one-way or secure hash functions) such as secure hash standard (SHA-family algorithms), message authentication code, and digital signatures.
A cryptographic hash function would hash any given message P and produce a fixed-length bit-string, and it must satisfy a number of requirements:
◾ The hash function must be efficient, that is, given a message P, the hash value of P, Hash(P), must be quickly computed.
◾ Given Hash(P), it is virtually impossible to find P. In this context, P is often referred to the preimage of the hash. In other words, this requirement says it is virtually impossible to find a preimage of a hash. It is easy to understand that if P is much longer than Hash(P) in size, this requirement can be easily satisfied because information must have been lost during the hash processing. However, even if P is shorter than Hash(P), the requirement must still hold.
◾ Given a message P, and the corresponding hash of P, Hash(P), it is virtually impossible to find a different message P′ that would produce exactly the same hash, that is, Hash(P) = Hash(P′). If the unfortunate event happens where Hash(P) = Hash(P′), we would say there is collision. This requirement states that it should be computationally prohibitive to find a collision.
The cryptographic hash function must consider every single bit in the message when producing the hash string so that even if a single bit is changed, the output would be totally different. There has been several generations of cryptographic hash functions. Currently the most common ones used are called secure hash algorithms (SHA), which are published as a federal information processing standard by the US National Institute of Standards and Technology. The SHA family of algorithms have four categories: SHA-0, SHA-1, SHA-2, and SHA-3. SHA-0 and SHA-1 both produce a 160-bit string, which are now considered obsolete. SHA-2, which produces a 256-bit string or a 512-bit string, is used commonly nowadays.
Digital signature is another very important cryptographic construct in building secure systems. A digital signature mimics a physical signature in legal documents, and it must possess the following properties:
◾ The receiver of a digitally signed document can verify the signer’s identity. This is to facilitate authentication of the signer. Unlike in real world, where an official could verify the signer identity by checking for government-issued identification document such as driver’s license or passport, the digital signature must be designed in a way that a remote receiver of the digital signature can authenticate the signer based on the digital signature alone.
◾ The signer of the digital signature cannot repudiate the signed document once it has been signed.
◾ No one other than the original signer of the signed document could possibly have fabricated the signature.
The first property is for authenticating the signer of a signed document. The second and the third properties are essentially the same because if another person could have fabricated the digital signature, then the original signer could in fact repudiate the signed document. In other words, if the original signer cannot repudiate the signed document, then it must be true that no one else could fabricate the digital signature. Digital signatures are typically produced by using public-key cryptography on the hash of a document. This hash of a document is typically called message digest. The message digest is used because public-key cryptography must use long-keys and it is computationally very expensive compared with symmetric cryptography. In this case, the no-collision requirement for secure hash functions is essential to protect the integrity of digital signatures.
Message authentication code (MAC) is based on secure hash function and symmetric key encryption. More specifically, the sender would concatenate the message to be sent and a security key together, then hash it to produce a MAC. It is used pervasively in message exchanges to both authenticate the sender and to protect the integrity of the message. The basis for authentication is that only the sender and the receiver would know the security key used to generate the MAC. Because of the characteristic of the secure hash function, if any bit in the message is altered during transmission, the transmitted MAC would differ from the one recomputed at the receiver. Hence, the MAC is also used as a form of checksum with much stronger protection than traditional checksum method such as CRC16.
In conventional systems, communication between a client and server is done over a session. Hence, security mechanisms were designed around this need. At the beginning of the session, the client and the server would mutually authenticate each other. Once the authentication step is done, a session key would be created and used to encrypt all messages exchanged within the session. For a prolonged session, the session key might be refreshed. For sessions conducted over the Web, the secure socket layer (SSL) (or transport layer security) protocol is typically used. The server authentication is done via a digital signature and public-key certificate protected by a public-key infrastructure. Client authentication is typically done via user-name and password. Some enterprise systems, such as directory services, adopt much more sophisticated authentication algorithms based on the challenge-response approach.