Consistency in Distributed Systems and the Paxos Algorithm

Consistency in Distributed Systems and the Paxos Algorithm

Problem Description:
In a distributed system, multiple nodes need to reach consensus on a certain value (e.g., configuration, state). However, nodes may fail or the network may experience delays. How does the Paxos algorithm solve the consensus problem? Please explain its basic procedure and key design concepts.

1. Challenges of the Consensus Problem

In distributed systems, nodes need to communicate to reach consensus, but the following issues may occur:
- Node Failures (crashes but no malicious responses).
- Network Delays/Losses (message reordering or timeouts).
Goal: Even with failures, as long as a majority of nodes are alive, consensus can be reached on a proposed value, ensuring safety (consistent results) and liveness (eventual agreement).

2. Core Roles and Phases of Paxos
Paxos divides nodes into three roles:

Proposer: Receives client requests and proposes a value (with a proposal number and the value).
Acceptor: Votes on proposals and stores the agreed-upon value.
Learner: Learns the accepted value (does not participate in voting).

The algorithm consists of two phases:
Phase One (Prepare Phase):

The Proposer generates a globally unique proposal number n (usually containing a timestamp and node ID) and sends a Prepare(n) request to all Acceptors.
Upon receiving Prepare(n), an Acceptor:
- If n is greater than all proposal numbers it has responded to in previous Prepare requests, it promises not to accept any proposals with numbers less than n and replies with the highest-numbered proposal it has accepted (if any).
- Otherwise, it rejects the request.

Phase Two (Accept Phase):

If the Proposer receives promises for n from a majority of Acceptors, it selects a value v:
- If none of the Acceptor replies include an accepted proposal, v can be the Proposer's own value.
- Otherwise, v must be the value associated with the highest proposal number in the replies (ensuring consistency).
It sends an Accept(n, v) request to the Acceptors.
Upon receiving Accept(n, v), an Acceptor:
- If it has not promised to any request with a number greater than n, it accepts the proposal and replies.
- Otherwise, it rejects.
If the Proposer receives acceptance replies from a majority of Acceptors, the value v is chosen and propagated through the Learners.

3. Key Design Concepts

Majority Principle: Any two majorities must overlap, ensuring a unique value is chosen.
Increasing Proposal Numbers: Prevents old proposals from overwriting new decisions.
Two-Phase Locking: The Prepare phase obtains promises from Acceptors; the Accept phase ensures the value is safely written.
Livelock Handling: Multiple Proposers competing for proposal numbers may cause livelocks, which can be optimized by electing a primary Proposer (e.g., Multi-Paxos).

4. Example
Assume three Acceptors (A1, A2, A3):

Proposer P1 sends Prepare(n=1). A1 and A2 promise and reply with no historical proposals. P1 initiates Accept(n=1, v=X). A1 and A2 accept, and value X is chosen.
If P2 simultaneously sends Prepare(n=2), A3 promises, but A1 and A2 reply that they have already accepted (1, X). P2 must choose v=X for its Accept request to prevent the value from being modified.

5. Practical Applications and Variants

Paxos serves as a theoretical foundation. Common industrial variants include:
- Multi-Paxos: Elects a Leader to reduce the Prepare phase and improve efficiency.
- Raft: Simplifies Paxos logic, enhancing understandability through Leader election and log replication.

Through these steps, Paxos ensures consistency in distributed systems while tolerating failures.