Implementation Principles and Selection of Distributed Locks

Implementation Principles and Selection of Distributed Locks

Problem Description: In high-concurrency distributed systems, how can exclusive access to shared resources be guaranteed? Please elaborate on the core requirements of distributed locks, the principles, advantages, disadvantages, and applicable scenarios of common implementation solutions (e.g., based on databases, Redis, ZooKeeper, etc.).

Solution Process:

Understanding the Core Problem and Requirements
- Problem: In single-machine systems, we can use thread locks (e.g., synchronized, ReentrantLock) to ensure exclusive access to shared resources. However, in distributed systems where applications are deployed across multiple machines, these local locks cannot work across processes. Therefore, a lock mechanism is needed that can coordinate the behavior of multiple nodes in a distributed environment.
- Core Requirements:
  - Mutual Exclusivity: At any given moment, only one client (or thread) can hold the lock. This is the most fundamental requirement.
  - Safety: A lock can only be released by the client that holds it, and must not be released by other clients, preventing accidental deletion.
  - Availability: In most cases, clients should be able to acquire and release locks. The lock service itself should not become unavailable due to the failure of a single node. This often implies that the lock service itself needs to be highly available.
  - Deadlock Prevention: There must be a mechanism to prevent a situation where a client crashes while holding a lock and cannot release it, causing the lock to be forever unavailable (i.e., a deadlock). This is typically achieved by setting a reasonable expiration time (lease) for the lock.
  - Reentrancy (Optional but Important): The same client, already holding the lock, can successfully acquire it again.
Common Implementation Solutions and Their Principles
- Solution One: Database-Based Implementation
  - Principle: Utilize the unique constraints or optimistic locking (e.g., version numbers) of a database to achieve mutual exclusion.
    - Method 1 (Unique Index): Create a lock table with a field (e.g., lock_name) representing the lock's name and establish a unique index on it. To acquire the lock, execute an INSERT statement attempting to insert a record with the specified lock_name. Successful insertion means acquiring the lock. Release the lock by deleting this record. The database's unique constraint ensures mutual exclusivity.
    - Method 2 (Optimistic Locking): Add a version number (version) field to the data table. When acquiring the lock, first query the current version number, then update with a condition WHERE version = [queried version number]. If the number of affected rows is 1, the lock is successfully acquired.
  - Advantages: Simple implementation, directly leverages existing databases, low learning curve.
  - Disadvantages:
    - Performance Bottleneck: Database I/O operations have poor performance and can become a system bottleneck under high concurrency.
    - Dependency on DB Availability: The database becomes a single point of failure and requires high-availability configuration.
    - No Automatic Expiration: If a client crashes, the lock record cannot be automatically deleted, easily leading to deadlocks (requires additional scheduled tasks for cleanup).
    - Difficulty in Non-Blocking Operations: Implementing logic like "try to acquire the lock and return immediately on failure" is relatively complex.
- Solution Two: Redis-Based Implementation
  - Principle: Leverage Redis's single-threaded model and the atomicity of the SET command.
    - Basic Command: SET lock_key unique_value NX PX 30000
      - NX: Set the key only if it does not exist, ensuring mutual exclusivity.
      - PX 30000: Set the key's expiration time to 30 seconds, preventing deadlocks.
      - unique_value (e.g., UUID): Must be a unique value to ensure safety. When releasing the lock, it's necessary to verify that the unique_value matches before deletion, preventing accidental deletion of another client's lock (use a Lua script to ensure atomicity).
    - Acquisition Process: Client A executes the above SET command. Success means acquiring the lock.
    - Release Process: Client A executes a Lua script that first compares whether the current lock's value equals its own set unique_value. If they match, it then deletes the key.
  - Advantages: Extremely high performance because Redis operates in memory. Relatively simple implementation.
  - Disadvantages:
    - Non-Strong Consistency: In a Redis master-slave architecture, if the master node writes successfully but crashes before synchronizing the data to a slave, and a slave is promoted to master, another client might also acquire the same lock, violating mutual exclusivity.
    - Lock Renewal Issue: If business execution time exceeds the lock's expiration time, the lock will be released automatically, potentially causing data inconsistency. Although a "watchdog" mechanism can be used for automatic renewal, it increases complexity.
- Solution Three: ZooKeeper-Based Implementation
  - Principle: Utilize ZooKeeper's ephemeral sequential nodes and Watch mechanism.
    - Acquisition Process:
      1. The client creates an ephemeral sequential node (e.g., /locks/my_lock/node_00000001) under a specified lock node (e.g., /locks/my_lock).
      2. The client retrieves all child nodes under /locks/my_lock and checks if the node it created has the smallest sequence number. If yes, it successfully acquires the lock.
      3. If not, it sets a Watch on the node with the sequence number immediately preceding its own.
      4. When the preceding node is deleted (lock released), ZooKeeper notifies the current client via the Watch.
      5. Upon notification, the client re-executes step 2 to check if it has now become the node with the smallest sequence number.
    - Release Process: The client actively deletes the ephemeral node it created.
    - Guaranteed Features:
      - Mutual Exclusivity: The node with the smallest sequence number acquires the lock.
      - Safety: Nodes are ephemeral. If a client disconnects (crashes), its node is automatically deleted, releasing the lock and preventing deadlocks.
      - Reentrancy: The client can write its own identifier when creating the node. When attempting to acquire the lock again, it checks if the current lock holder is itself.
  - Advantages:
    - High Reliability: ZooKeeper ensures strong consistency via the ZAB protocol, making the lock model very reliable.
    - No Expiration Time Needed: Automatic cleanup via ephemeral nodes eliminates concerns about lock timeout.
    - Enables Fair Locks: Nodes are created sequentially, naturally implementing a first-come, first-served fair lock.
  - Disadvantages:
    - Performance Overhead: Each creation, deletion of nodes, and setting of Watches requires network communication with the ZooKeeper cluster, resulting in performance lower than Redis.
    - Complexity: Requires introducing a ZooKeeper client, with relatively higher understanding and maintenance costs.
Solution Selection Summary
- Pursuing Ultimate Performance, Can Tolerate Minimal Probability of Lock Failure: Choose Redis. For example, in flash sale scenarios, inventory deduction can ultimately be safeguarded by database unique constraints or business logic.
- Requiring High Reliability, Business Scenarios Have Extremely Strict Data Consistency Requirements: Choose ZooKeeper. For example, critical steps in core transaction processes.
- Simple System, Low Concurrency, and Unwillingness to Introduce New Middleware: Consider the Database solution, but it is generally not recommended for high-concurrency production environments.

Through this step-by-step explanation, you should be able to understand why distributed locks are necessary, grasp the core concepts, implementation details, and trade-offs of several mainstream solutions, and thus make reasonable technical selections when facing specific business scenarios.