Читать книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao - Страница 66

EXAMPLE 2.7

Оглавление

Consider a distributed system consisting of three processes P0, P1, and P2, shown in Figure 2.19. P0 sends P1 a regular message <REGULAR,k,?,mi>. After the message is fully logged at P0, P1 sends P2 a message <REGULAR,s,?,mt>. Then, both P0 and P1 crashed. Upon recovery, although P0 can resend the regular message <REGULAR,k,?,mi> to P1, however, the receiving order information rsn is lost due the failures. Hence, it is not guaranteed that P1 could initiate the correct state interval that resulted in the sending of regular message <REGULAR,s,?,mt>. P2 would become an orphan process and be forced to rollback its state.

We prove below that the recovery mechanism introduced in section 2.3.2.3 guarantees a consistent global state of the distributed system after the recovery of a failed process. The only way the global state of a distributed system becomes inconsistent is when one process records the receipt of a (regular) message that was not sent by any other process (i.e., the message is an orphan message). We prove that any regular message that is received at a process must have been logged at the sending process. For a pair of nonfailing processes, the correctness of this statement is straightforward because the sending process always logs any message it sends. The interesting case is when a nonfailing process received a regular message that was sent by a process that fails subsequently.


Figure 2.19 Two concurrent failures could result in the loss of determinant information for regular messages.

Let’s assume a process Pi fails and another process Pj receives a regular message sent by Pi prior to the failure, we need to prove that the message must have been logged at Pi either prior to its failure or will have been logged before the end of the recovery.

If Pi checkpointed its state after sending the regular message prior to the failure, the message must have been logged in stable storage and is guaranteed to be recoverable. Otherwise, the message itself would have been lost due the failure because it was logged in volatile memory. However, we prove that the message will be regenerated during the recovery.

According to the protocol, a process cannot send any new regular message before it has received the ACK message for every regular message received. The fact that the message was sent means Pi must have received the ACK message for the regular message that triggered the state interval in which the message was sent. This in turn means that the sending process of the regular message, say Pk must have received the corresponding ORDER message sent by Pi. Hence, upon recovery, Pk will be contacted by Pi and the regular message with a valid rsn value will be retransmitted to Pi. This would ensure the recovering process Pi to reinitiate the state interval in the correct order. The regular message received by Pj will be correctly regenerated and logged at Pi during recovery. This completes our proof.

From Traditional Fault Tolerance to Blockchain

Подняться наверх