Snapshot Algorithm and Global State Recording in Distributed Systems

Snapshot Algorithm and Global State Recording in Distributed Systems

Problem Description
In distributed systems, due to concurrent execution of nodes and the lack of a global clock, how to efficiently and consistently record the global state of the entire system (e.g., for detecting deadlocks, resource usage) is a classic problem. Snapshot algorithms (such as the Chandy-Lamport algorithm) capture a consistent global state without halting system operation through local recording and message passing. It is essential to understand its triggering conditions, recording process, and potential edge cases.

1. Problem Background and Challenges

Significance of Global State: The global state of a distributed system includes the local states of all nodes (e.g., memory data, program counter) and channel states (messages in transit).
Challenges: Halting all nodes to record the state directly compromises system availability. If nodes independently record their states, inconsistency may arise due to in-transit messages (e.g., a message received after a snapshot is recorded, but its sending was not recorded by the sender).

2. Core Idea of the Chandy-Lamport Algorithm

Goal: To record a consistent global snapshot without blocking the system, satisfying: if the snapshot records that a receiver has received message M, then the sender's snapshot must record M as sent.
Basic Assumptions:
- Reliable channels (no message loss) with in-order delivery.
- Bidirectional and interconnected channels between nodes (direct communication).

3. Detailed Algorithm Steps
Step 1: Snapshot Initialization

A node (called the initiator) actively starts the snapshot process:
1. Records its own local state.
2. Sends a marker message (Marker) to all other nodes, while initializing an empty queue for each incoming channel to record messages arriving during the snapshot.

Step 2: Node Rules for Processing Marker Messages

When a node receives a Marker for the first time:
1. Records its own local state.
2. Stops recording messages for the channel associated with this Marker and instead stores subsequent messages from that channel in a temporary queue (to avoid mixing post-snapshot messages).
3. Sends a Marker to all other nodes (if not already sent).
If a node receives a Marker after already recording its local state:
- Only stops recording messages for the corresponding channel (does not re-record local state).

Step 3: Calculation of Channel State

For any channel C (from node A to node B):
- Channel state in the snapshot = The set of messages received via C by B before B recorded its snapshot but not recorded as sent by A in its snapshot.
- This is indirectly captured through B's temporary queue: messages already in B's queue when it receives the Marker represent the "in-transit messages" for channel C.

4. Example Demonstration
Assume a system with nodes P1, P2, P3, and P1 is the initiator:

P1 records its local state and sends Markers to P2 and P3.
P2 receives the Marker (first time): records its local state, sends Markers to P1 and P3, and starts buffering messages from P1 and P3.
If P1 sends message M to P2 after sending the Marker, M may:
- Arrive before P2 receives the Marker: M is processed normally by P2 and not included in the snapshot.
- Arrive after P2 receives the Marker: M is stored in P2's buffer queue, becoming part of the channel state.
After all nodes have processed the Markers and the buffers are stable, merging all local states and channel queues yields a consistent global snapshot.

5. Key Characteristics and Edge Cases

Consistency Guarantee: The algorithm ensures that in the snapshot, each message is either recorded as sent by the sender or recorded as not received by the receiver (no contradictions).
Concurrent Initiation: Allows multiple nodes to initiate snapshots simultaneously (distinguished by assigning unique identifiers to each snapshot).
Performance: The system continues normal operation during the snapshot, with only minor communication and storage overhead.

6. Application Scenarios

Distributed debugging (e.g., detecting global deadlocks).
Fault tolerance and recovery (periodic snapshots for system rollback).
Stream processing systems (e.g., Apache Flink implements state consistency using variant algorithms).

Through the above steps, the Chandy-Lamport algorithm non-intrusively solves the challenge of recording the global state in distributed systems, becoming one of the foundational algorithms for distributed snapshots.