Distributed Transactions in Databases and the Two-Phase Commit Protocol

Distributed Transactions in Databases and the Two-Phase Commit Protocol

Problem Description
A distributed transaction refers to a transaction where the participants, servers supporting the transaction, resource servers, and transaction manager are located on different nodes within a distributed system. The Two-Phase Commit (2PC) protocol is the core mechanism for ensuring the atomicity of distributed transactions. Through the interaction between a coordinator and participants, it guarantees that all nodes either commit the transaction entirely or roll it back entirely. Understanding the working mechanism, advantages, disadvantages, and practical application scenarios of 2PC is crucial for mastering distributed databases.

Knowledge Explanation

Challenges of Distributed Transactions
- In a single-node database, transactions ensure ACID properties through logging and locking mechanisms. However, in a distributed environment where data is scattered across different nodes, network latency and node failures can lead to partial success (some nodes commit) and partial failure, thereby breaking transaction atomicity.
- Example: A bank transfer involves deducting funds from node A and adding funds to node B. If A succeeds but B fails, data inconsistency occurs.
Basic Roles in the Two-Phase Commit Protocol
- Coordinator: The initiator of the transaction, responsible for making the final decision to commit or rollback the transaction.
- Participants: The actual execution nodes of the distributed transaction, responsible for performing local transaction operations and reporting their status back to the coordinator.
Phase One: Prepare Phase
- The coordinator sends a prepare request containing the transaction details to all participants.
- Participants execute the local transaction (writing logs, acquiring locks, etc.) but do not commit, ensuring the ability to either commit or rollback later.
- If local execution is successful, the participant replies Yes; if it fails (e.g., due to constraint violations), it replies No.
- Key Point: After the prepare phase, participants enter a "blocked state," awaiting instructions from the coordinator.
Phase Two: Commit Phase
- If the coordinator receives Yes from all participants:
  - It sends a commit command. Participants formally commit their local transactions, release locks, and reply with an ack.
  - The coordinator marks the transaction as complete after receiving all ack messages.
- If any participant replies No or times out:
  - The coordinator sends a rollback command. Participants roll back the transaction and release resources.
- Note: In the second phase, participants must obey the coordinator's instructions, even retrying in case of temporary failures.
Failure Handling and Drawbacks of 2PC
- Coordinator Single Point of Failure: If the coordinator crashes before sending the commit command, participants remain blocked permanently. Solutions include introducing a backup coordinator or timeout mechanisms.
- Risk of Data Inconsistency: If the coordinator crashes after sending commit to only some participants, it may lead to some nodes committing while others do not. For example, if participant A commits but the network to B is interrupted, the coordinator cannot instruct B to rollback.
- Performance Issues: The synchronous blocking design forces participants to wait for responses from all nodes after the prepare phase, impacting concurrency performance.
Practical Applications and Optimizations
- Database features like MySQL's XA transactions and Java's JTA specification implement distributed transactions based on 2PC.
- Improved protocols like Three-Phase Commit (3PC) reduce blocking by adding a pre-commit phase, but introduce additional complexity.
- Modern systems often combine flexible transaction models (e.g., the Saga pattern) to avoid synchronous blocking, using compensation mechanisms to ensure eventual consistency.

Summary
The Two-Phase Commit protocol achieves strong consistency in distributed environments through a "prepare-commit" two-phase interaction. However, it requires a trade-off between performance and reliability. After understanding its workflow and limitations, one can further study alternative models like TCC and Saga, choosing the appropriate transaction model based on specific business scenarios.