Читать книгу From Traditional Fault Tolerance to Blockchain - Wenbing Zhao - Страница 53
2.2.4 Discussion
ОглавлениеThe two global checkpointing protocols introduced in this section share a number of similarities.
◾ Both rely on virtually the same system model, and use a special control message to propagate and coordinate the global checkpointing.
◾ They both recognize the need to capture the channel state to ensure the recoverability of the system.
◾ The mechanism to capture the channel state is virtually the same for both protocols, as shown in Figure 2.9.– In both protocols, a process starts logging messages (for the channel state) for each channel upon the initiation of the global checkpoint (at the initiator) or upon the receipt of the first control message (i.e., the Marker message in the Chandy and Lamport protocol and the CHECKPOINT message in the Tamir and Sequin protocol).– In both protocols, the process stops logging messages and conclude the channel state for each channel when it receives the control message in that channel.
◾ The communication overhead of the two protocols is identical (i.e., the same number of control messages is used to produce a global checkpoint).
Figure 2.9 A comparison of the channel state definition between (a) the Chandy and Lamport distributed snapshot protocol and (b) the Tamir and Sequin global checkpointing protocol.
The two protocols also differ in their strategies in producing a global checkpoint.
◾ The Tamir and Sequin protocol is more conservative in that a process suspends its normal execution as soon as it learns that a global checkpointing round has started. In light of the Chandy and Lamport protocol, the suspension of normal execution could have been avoided during a global checkpointing round.
◾ The reason for the blocking design in the Tamir and Sequin protocol is that a process captures the channel states prior to taking a local checkpoint. While capturing the channel state, a process cannot execute the regular messages received because doing so would alter the process state, thereby potentially rendering the global checkpoint inconsistent. On the other hand, in the Chandy and Lamport protocol, a process captures the channel state after it has taken a local checkpoint, thereby enabling the execution of regular messages without the risk of making the global checkpoint inconsistent.
◾ The Tamir and Sequin protocol is more complete and robust because it ensures the atomicity of the global checkpointing round. Should a failure occurs, the current round would be aborted. The Chandy and Lamport protocol does not define any mechanism to ensure such atomicity. Presumably, the mechanisms defined in the Tamir and Sequin protocol can be incorporated to improve the Chandy and Lamport protocol.