Читать книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao - Страница 39

EXAMPLE 2.1

To understand the problem better, consider the following example. Assume that P₀ and P₁ represent two bank accounts, A and B respectively. The purpose of m₀ is to deposite $100 to account B after P₀ has debited account A. P₀ takes a checkpoint C₀ before the debit operation, and P₁ takes a checkpoint C₁ after it has received and processed the deposit request (i.e., m₀), as illustrated in Figure 2.2(a). If P₀ crashes after sending the deposit request (m₀), and P₁ crashes after taking the checkpoint C₁, upon recovery, P₁’s state would reflect a deposit of $100 (from account A) while P₀’s state would not reflect the corresponding debit operation. Consequently, $100 would appear to have come from nowhere, which obviously is not what had happened. In essence, the global state constructed using the wrong set of checkpoints does not correspond to a state that could have happened since the initial state of the distributed system. Such a global state is referred to as an inconsistent global state.

Next, let’s look at a scenarios (shown in Figure 2.2(b)) in which the set of checkpoints can be used to properly recover the system to an earlier state prior to the failure. The checkpoint (C₀) taken by P₀ reflects the sending event of m₀. The checkpoint C₁ is taken by P₁ after it has received m₀, therefore, the dependency on P₀ is captured by C₁. Similarly, the dependency of P₂ on P₁ is also preserved by the checkpoint C₂ taken by P₂. Such a global state is an example of consistent global state. Of course, the execution after the checkpoints, such as the sending and receiving of m₂ and m₃, will be lost upon recovery.

The scenario described in Figure 2.2(c) is the most subtle one. In this scenario, P₀ takes a checkpoint after it has sent message m₀ while P₁ takes a checkpoint before it receives m₀ but after it has sent m₁, and P₂ takes a checkpoint before it receives m₁. This means that the checkpoint C₀ reflects the state change resulting from sending m₀ whereas C₁ does not incorporate the state change caused by the receiving of m₀. Consequently, this set of checkpoints cannot be used to recover the system after a failure because m₀ and m₁ would have been lost. However, the global state reconstructed by using such a set of checkpoints would still be qualified as a consistent global state because it is one such that it could have happened, i.e., messages m₀ and m₁ are still in transit to their destinations. To accommodate this scenario, an additional type of states, referred to as channel state, is introduced as part of the distributed system state [5].

To define the channel state properly, it is necessary to provide a more rigorous (and abstract) definition of a distributed system. A distributed system consists of two types of components [5]:

◾ A set of N processes. Each process, in turn, consists of a set of states and a set of events. One of the states is the initial state when the process is started. Only an event could trigger the change of the state of a process.

◾ A set of channels. Each channel is a uni-directional reliable communication channel between two processes. The state of a channel is the set of messages that are still in transit along the channel (i.e., they have not yet been received by the target process). A TCP connection between two processes can be considered as two channels, one in each direction.

A pair of neighboring processes are always connected by a pair of channels, one in each direction. An event (such as the sending or receiving of a message) at a process may change the state of the process and the state of the channel it is associated with, if any. For example, the injection of a message into a channel may change the state of the channel from empty to one that contains the message itself.

Using this revised definition, the channel states in the third scenario would consist of the two in-transit messages m₀ and m₁. If the channel states can be properly recorded in addition to the checkpoints in this scenario, the recovery can be made possible (i.e., m₀ will be delivered to P₁ and m₁ will be delivered to P₂ during recovery).

From Traditional Fault Tolerance to Blockchain

Подняться наверх