Implementation Schemes of Distributed Locks
Problem Description: In a distributed system, when multiple processes or services need mutually exclusive access to a shared resource, we require a cross-process locking mechanism, known as a distributed lock. Please elaborate on the core elements to consider when implementing a robust distributed lock, and analyze several common implementation schemes along with their advantages and disadvantages.
Solution Process:
The goal of a distributed lock is to ensure that, among multiple application instances deployed in a distributed manner, only one thread from one instance can execute a specific piece of code or access a particular resource at any given time.
Step 1: Clarify the Basic Requirements for a Robust Distributed Lock
Before delving into specific schemes, we must first establish the evaluation criteria. A distributed lock suitable for a production environment should possess the following characteristics:
- Mutual Exclusivity: This is the most fundamental property. At any moment, the lock can only be held by one client (a thread within a process).
- Deadlock-Free: Even if the client holding the lock crashes, experiences a network partition, or any other unexpected event, the lock must eventually be released to avoid permanent system deadlock. This is typically achieved by setting an expiration time for the lock.
- Fault Tolerance: The distributed lock service itself must be highly available; it should not become entirely unavailable due to the failure of a few nodes.
- Reentrancy (Optional but Important): The same client (or thread), while already holding the lock, can successfully acquire it again without being blocked by itself. This simplifies recursive or callback logic that may exist in client code.
Step 2: Database-Based Implementation Scheme
This is the most intuitive approach, for example, utilizing the unique constraint of a relational database.
-
Implementation Method:
- Create a lock table with a field (e.g.,
lock_name) representing the lock identifier and establish a unique constraint on this field. - To acquire the lock, execute an insert statement:
INSERT INTO lock_table (lock_name) VALUES ('my_lock');. - Due to the unique constraint, if
'my_lock'already exists, the insert operation fails, indicating failure to acquire the lock. - To release the lock, execute a delete statement:
DELETE FROM lock_table WHERE lock_name = 'my_lock';.
- Create a lock table with a field (e.g.,
-
Advantages: Simple implementation, leverages existing databases, easy to understand.
-
Disadvantages:
- Performance Bottleneck: Database I/O performance is generally poor; frequent lock operations can become a system bottleneck.
- Single Point of Failure: If the database fails, the entire lock service becomes unavailable. Although high availability can be improved through master-slave failover, during the asynchronous replication process of failover, the mutual exclusivity of the lock might be compromised (Client A acquires the lock on the master, but the data hasn't been replicated to the slave before the master fails. The slave promotes to the new master, and Client B can then acquire the same lock on the new master).
- No Expiration Time: If a client crashes after acquiring the lock and cannot release it normally, a deadlock occurs. While a scheduled task can be added to clean up timed-out locks, this increases complexity.
Step 3: Redis-Based Implementation Scheme
Redis is a popular choice for implementing distributed locks due to its high performance and rich data structures.
-
Basic Implementation (SETNX command):
- Acquire lock: Use the command
SETNX lock_key unique_value.SETNXstands for "SET if Not eXists". Iflock_keydoes not exist, the setting succeeds (returns 1), indicating lock acquisition; if it exists, the setting fails (returns 0). - Release lock: Simply delete the key using
DEL lock_key.
- Acquire lock: Use the command
-
Solving the "Deadlock-Free" Problem: To prevent a lock from becoming permanently held if a client crashes, we must set an expiration time for the lock when creating it. After Redis 2.6.12, it is recommended to use a single atomic command:
SET lock_key unique_value NX PX 30000. This command means: only set the value of keylock_keytounique_valueand set its expiration time to 30000 milliseconds (PX) if the key does not exist (NX). The atomicity of this command is crucial; it avoids the scenario where a crash between setting the value and setting the expiration time results in a lock without a timeout. -
Solving the "Accidental Deletion" Problem: If Client A's operation exceeds the lock's expiration time after acquiring the lock, Redis will automatically release the lock. Client B can then acquire it. When Client A finishes its operation, it might mistakenly delete the lock now held by Client B. To solve this, we need to verify whether the lock is still held by us before deletion. The delete operation must be atomic, typically implemented using a Lua script:
if redis.call("get", KEYS[1]) == ARGV[1] then return redis.call("del", KEYS[1]) else return 0 endHere,
ARGV[1]is a unique random value (e.g., UUID) generated by each client, ensuring that a client can only delete the lock it set itself. -
Advantages: Extremely high performance, relatively simple implementation.
-
Disadvantages:
- Non-Strong Consistency: In a Redis master-slave asynchronous replication architecture, if a client successfully acquires a lock on the master node, but the lock information hasn't been replicated to a slave before the master fails, and a slave is promoted to master, a new client might acquire the same lock again, violating mutual exclusivity.
Step 4: ZooKeeper-Based Implementation Scheme
ZooKeeper is a coordination system that provides consistency services for distributed applications. Its data model and Watch mechanism are well-suited for implementing distributed locks.
-
Implementation Method (Ephemeral Sequential Nodes):
- Define the lock: Create a persistent node (Persistent Node) in ZooKeeper as the root directory for locks, e.g.,
/locks/my_lock. - Acquire the lock:
- Each client wanting to acquire the lock creates an ephemeral sequential node (Ephemeral Sequential Node) under
/locks/my_lock, e.g.,/locks/my_lock/lock_00000001. - The client retrieves all child nodes under
/locks/my_lockand sorts them by sequence number. - Check if it is the node with the smallest sequence number: If yes, the client successfully acquires the lock.
- If not, the client watches (Watch) the deletion event of the node immediately preceding its own in sequence.
- Each client wanting to acquire the lock creates an ephemeral sequential node (Ephemeral Sequential Node) under
- Wait for the lock: When the preceding node it is watching is deleted (indicating the previous client released the lock), ZooKeeper notifies the current client. Upon notification, the client repeats step 2 (retrieve all child nodes and check if it is now the smallest node).
- Release the lock: The client simply deletes the ephemeral sequential node it created. Because it's an ephemeral node, if the client crashes and its session expires, ZooKeeper automatically deletes the node, thereby releasing the lock.
- Define the lock: Create a persistent node (Persistent Node) in ZooKeeper as the root directory for locks, e.g.,
-
Advantages:
- High Reliability: ZooKeeper guarantees strong consistency through the ZAB protocol, strictly ensuring the lock's mutual exclusivity.
- Automatic Release: The nature of ephemeral nodes naturally solves the "deadlock-free" problem.
- Enables Fair Locks: Sequential nodes and the Watch mechanism naturally form a waiting queue, implementing a first-come, first-served fair lock.
-
Disadvantages:
- Performance Overhead: Each lock acquisition and release requires dynamic creation and destruction of nodes and maintaining an active session (heartbeat) with ZooKeeper, generally resulting in lower performance compared to Redis.
- Complexity: The understanding and implementation complexity is higher compared to the Redis scheme.
Summary and Comparison
| Scheme | Implementation Difficulty | Performance | Reliability/Consistency | Characteristics |
|---|---|---|---|---|
| Database | Simple | Poor | Weak (risk during master-slave failover) | Not recommended for high-concurrency scenarios |
| Redis | Medium | Very High | Weak (risk due to asynchronous replication) | Maximized performance; reliability can be enhanced via the RedLock algorithm (though still debated) |
| ZooKeeper | Complex | Medium | Strong (based on Paxos variant, strong consistency) | Highest reliability, naturally solves deadlock, enables fair locks |
The choice of scheme depends on your specific business scenario: if pursuing ultimate performance and can tolerate a minimal probability of lock failure, Redis is an excellent choice; if absolute lock reliability and strong consistency are required, ZooKeeper or etcd should be chosen.