Distributed Lock Implementation Principles and Common Solutions
Problem Description
In a distributed system, multiple nodes may simultaneously compete for the same shared resource (such as modification rights to a row of database data, execution rights of a task, etc.). To avoid data inconsistency or resource conflicts, a mechanism is needed to ensure that only one node can access the resource at any given time. This mechanism is the distributed lock.
1. Core Requirements for Distributed Locks
Before delving into implementation solutions, it is essential to clarify the conditions a qualified distributed lock should meet:
- Mutual Exclusivity: Only one client can hold the lock at any moment.
- Deadlock Avoidance: The lock should eventually be released even if the client holding it crashes or a network partition occurs.
- Fault Tolerance: The lock service should operate normally as long as the majority of nodes in the distributed system are alive.
- High Performance: The latency of locking and unlocking should not become a system bottleneck.
2. Database-based Implementation (Pessimistic Locking)
Example Solution: Utilizing unique indexes or exclusive locks in a database.
- Steps:
- Create a lock table containing fields such as resource name (unique index), holder identifier, expiration time, etc.
- When locking, insert a record (e.g.,
INSERT INTO lock_table(resource, owner, expire_time) VALUES(...)), using the unique index to ensure mutual exclusivity. - If the insertion is successful, the lock is acquired; otherwise, retry or abort.
- When unlocking, delete the corresponding record.
Disadvantages:
- The database single point may become a performance bottleneck.
- If a client crashes without deleting the record, an expiration time mechanism is needed to avoid deadlocks, but setting an accurate expiration time is challenging.
3. Redis-based Implementation (In-memory)
Example Solution: Using Redis's SET key value NX PX timeout command.
- NX: Sets the key only if it does not exist, ensuring mutual exclusivity.
- PX: Sets the key's expiration time (in milliseconds) to avoid deadlocks.
- Example:
SET lock:resource1 client_id NX PX 30000 # Automatically expires after 30 seconds - Holder verification during unlocking:
-- Using a Lua script to ensure atomicity if redis.call("GET", KEYS[1]) == ARGV[1] then return redis.call("DEL", KEYS[1]) else return 0 end
Potential Issues:
- Clock drift may cause premature lock release.
- Locks may be lost during master-slave failover (due to Redis's asynchronous replication).
4. ZooKeeper-based Implementation (Strong Consistency)
Core Concept: Leveraging ZooKeeper's ephemeral sequential nodes and Watch mechanism.
- Steps:
- Create an ephemeral sequential node under ZooKeeper's lock directory (e.g.,
/lock/resource1_00000001). - Check if the node created is the one with the smallest sequence number: if yes, the lock is acquired.
- If not, watch for the deletion event of the preceding node (to avoid the herd effect).
- When unlocking, simply delete the node, and ZooKeeper will automatically notify the subsequent nodes.
- Create an ephemeral sequential node under ZooKeeper's lock directory (e.g.,
Advantages:
- Automatically cleans up nodes of crashed clients through the session mechanism, preventing deadlocks.
- Strong consistency ensures lock reliability.
Disadvantages:
- Frequent node creation and watching may incur performance overhead.
5. Comparison and Selection Recommendations
- Database Locks: Suitable for scenarios with low concurrency and existing database dependencies, but performance is poor.
- Redis Locks: High performance, but mutual exclusivity may be compromised in extreme scenarios (trade-off between consistency and performance required).
- ZooKeeper Locks: Suitable for scenarios requiring strong consistency (e.g., financial transactions), but complexity is higher.
Advanced Considerations:
- How to implement reentrant locks?
- Can the RedLock algorithm truly solve the reliability issues of Redis locks?