Читать книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao - Страница 45

EXAMPLE 2.2

Figure 2.3 An example of the domino effect in recovery with uncoordinated checkpointing.

In the example illustrated in Figure 2.3, process P₁ crashed after it has sent message m₈ to P₀, but before it has a chance to take a checkpoint. The last checkpoint taken by P₁ is C_1,1. Furthermore, P₂ also crashed concurrently. Now, let’s examine the impact of the failure of P₁ and P₂:

◾ The most recent checkpoint at P0, C0,1, cannot be used because it is inconsistent with C1,1. Therefore, P0 would have to rollback to C0,0.

◾ The most recent checkpoint at P1, C1,1, cannot be used because it is inconsistent with C2,1, i.e., C1,1 reflected the receiving of m6 but C2,1 does not reflect the sending of m6. This means that P1 would have to rollback to C1,0.

◾ Unfortunately, C2,1 is not consistent with C1,0 because it recorded the receiving of m4 while C1,0 does not reflect the sending of m4. This means P2 would have to rollback to C2,0.

◾ This in turn would make it impossible to use any of the two checkpoints, C3,1 or C3,0, at P3. This would result in P3 rolling back to its initial state.

◾ The rollback of P3 to its initial state would cause the invalidation of C2,0 at P2 because it reflects the state change resulted from the receiving of m1, which is not reflected in the initial state of P3. Therefore, P2 would have to be rolled back to its initial state too.

◾ The rollback of P1 to C1,0 would invalidate the use of C0,0 at P0 because of m5. This means that P0 would have to rollback to its initial state too.

◾ Finally, the rollback of P0 to its initial state would invalidate the use of C1,0 at P1, thereby forcing P1 to rollback to its initial state. Consequently, the distributed system can only recover to its initial state.

Second, to enable the selection of a set of consistent checkpoints during recovery, the dependency of the checkpoints has to be determined and recorded together with each checkpoint. This would incur additional overhead and increase the complexity of the implementation [2]. As a result, the uncoordinated checkpointing is not as simple as and not as efficient as one would have expected [3].

From Traditional Fault Tolerance to Blockchain

Подняться наверх