Raft Consensus Algorithm in Distributed Systems

Raft Consensus Algorithm in Distributed Systems

Problem Description:
Raft is a consensus algorithm for managing replicated logs, designed to be a more understandable alternative to Paxos. In interviews, it is often required to explain Raft's core mechanisms, how it ensures consistency, and its advantages and disadvantages compared to Paxos.

Solution Process:

  1. Raft's Design Goals

    • Raft simplifies understanding by decomposing the problem (leader election, log replication, safety) and emphasizing state machine consistency.
    • Core idea: All nodes start in the same state. By electing a Leader to uniformly receive client requests and manage log replication, it ensures that multiple replica state machines execute commands in the same order.
  2. Node Roles and Term

    • Each node has three roles: Leader (handles all requests), Follower (passively responds), and Candidate (temporary state during elections).
    • Term: A globally continuous, incrementing number (at most one Leader per term) used to identify stale information (e.g., requests from an old Leader).
  3. Leader Election

    • Trigger Condition: A Follower transitions to Candidate if it does not receive a heartbeat from the Leader within an election timeout (e.g., 150-300 ms).
    • Election Process:
      1. Candidate increments its term and sends RequestVote RPCs to other nodes.
      2. A node votes only for the first request that meets the condition (the requester's log is at least as up-to-date as its own).
      3. Upon receiving a majority (N/2+1) of votes, the Candidate becomes the Leader and immediately sends heartbeats to establish authority.
    • Election Safety: At most one Leader per term (avoiding split-brain). Randomized timeout reduces conflicts.
  4. Log Replication

    • Process:
      1. Client sends a command to the Leader, which appends the log entry locally (uncommitted).
      2. Leader replicates the log to a majority of nodes via AppendEntries RPCs.
      3. After receiving acknowledgments from the majority, the Leader commits the log (applies it to the state machine) and notifies Followers to commit.
    • Log Matching Property:
      • For logs across different nodes, entries with the same index and term have identical content, and the preceding log index is the same.
      • In case of conflicts (e.g., missing logs in a Follower), the Leader forcibly overwrites the Follower's log (corrected via consistency checks and backtracking).
  5. Safety and Split-Brain Handling

    • Constraint: Only nodes containing all committed logs can become Leaders (ensured via log comparison in RequestVote RPCs).
    • Commit Rule: A Leader can only commit logs from its current term (indirectly committing older logs), avoiding overwriting of already committed logs (see Figure 2.8 in the Raft paper).
    • Split-Brain Scenario: During a network partition, the minority partition cannot elect a new Leader (lacks majority votes), while the old Leader continues to serve the majority partition; after partition recovery, the minority Leader automatically steps down.
  6. Advantages and Disadvantages Compared to Paxos

    • Advantages:
      • Easier to implement (directly used in systems like Etcd, Consul).
      • Strong leadership simplifies log replication flow.
    • Disadvantages:
      • Reliance on Leader performance (single point of bottleneck);
      • Reduced availability during network partitions (requires a majority to be alive).

Summary: Raft ensures consistency while reducing engineering complexity through role separation, term mechanisms, and log matching rules. Understanding its election and replication processes hinges on grasping the "majority principle" and "log continuity."