Database Locking Mechanisms and Concurrency Control

Database Locking Mechanisms and Concurrency Control

Description: The database locking mechanism is a core technology for ensuring data consistency and transaction isolation. When multiple transactions access the database simultaneously, locks prevent data inconsistency issues caused by concurrent operations, such as lost updates, dirty reads, and non-repeatable reads. Understanding lock types, granularity, as well as the generation and resolution of deadlocks is a key knowledge point in the database field.

Knowledge Explanation:

Step 1: Why are Locks Needed? — Concurrency Problems

Imagine you and a friend are editing the same line of text in an online shared document simultaneously. Without a control mechanism, content one person just typed might be instantly overwritten by the other's action. Similar problems exist in databases, mainly manifested in the following three aspects:

  1. Dirty Read: Transaction A reads a modification made by Transaction B that has not yet been committed. If Transaction B later rolls back, then Transaction A has read invalid "dirty" data.
  2. Non-Repeatable Read: Transaction A reads the same piece of data multiple times within its scope. Between the two reads, Transaction B modifies and commits that data, causing Transaction A's two read results to be inconsistent.
  3. Phantom Read: Transaction A queries a set of data based on a condition. Meanwhile, Transaction B inserts or deletes some records that meet that condition and commits. When Transaction A queries again with the same condition, it finds some extra "phantom" rows or some rows missing.

Locks exist to solve these problems.

Step 2: Basic Types of Locks

Locks can be classified based on different dimensions. The most fundamental is by "permission":

  1. Shared Lock (S Lock, Read Lock)

    • Behavior: After a transaction places a shared lock on data, other transactions can continue to place shared locks to read that data, but cannot place exclusive locks to modify it.
    • Analogy: It's like many people can open the same PDF file to read simultaneously, but as long as someone is reading, no one can modify the file's content.
    • Purpose: To ensure that during the reading process, the data will not be modified by other transactions, thereby solving the "dirty read" problem.
  2. Exclusive Lock (X Lock, Write Lock)

    • Behavior: After a transaction places an exclusive lock on data, other transactions can neither place shared locks to read it, nor place exclusive locks to modify it.
    • Analogy: It's like you have exclusive editing rights to the document. Before you finish editing and save, others can neither view nor modify it.
    • Purpose: To ensure that when modifying data, no other transaction will read or modify the same data, guaranteeing the atomicity and consistency of the modification.

Compatibility Rule: Can be simply remembered as "read-read compatible, read-write/write-write mutually exclusive."

Step 3: Lock Granularity

Locks can be applied to data units of different sizes, which is the lock granularity. The smaller the granularity, the better the concurrency, but the higher the overhead of managing locks.

  • Row-Level Lock: Locks a single row in a table. The finest granularity, highest concurrency, and is the default or commonly used level in mainstream relational databases (e.g., MySQL's InnoDB, PostgreSQL).
  • Page-Level Lock: Locks a page of data (a basic unit of database storage, usually containing multiple rows). Granularity is between row locks and table locks.
  • Table-Level Lock: Locks the entire table. The coarsest granularity, simple implementation, low overhead, but the worst concurrency performance. For example, MySQL's MyISAM engine uses table-level locks.

Selection Strategy: Database systems typically automatically choose lock granularity, balancing overhead and concurrency.

Step 4: Locking Protocol and Isolation Levels

Having locks alone is not enough; rules are needed for when a transaction should acquire and release locks. This set of rules is the "locking protocol." The SQL standard defines different transaction isolation levels, corresponding to different strictness levels of locking protocols, to solve different concurrency problems.

Isolation Level Dirty Read Non-Repeatable Read Phantom Read Brief Description of Locking Strategy
Read Uncommitted ❌ Possible ❌ Possible ❌ Possible Briefly holds exclusive locks on written data (released at transaction end); reads require no locks.
Read Committed ✅ Prevented ❌ Possible ❌ Possible Holds exclusive locks on written data (released at transaction end); places shared locks on read data, releasing them immediately after reading.
Repeatable Read ✅ Prevented ✅ Prevented ❌ Possible Holds exclusive locks on written data (released at transaction end); places shared locks on read data, holding them until the transaction ends.
Serializable ✅ Prevented ✅ Prevented ✅ Prevented Strictest. May use "gap locks" during range queries to prevent phantom reads.

Key Point: The lock hold time is crucial. Under the "Read Committed" level, shared locks are released immediately after reading, so other transactions can modify the data between your two reads, causing "non-repeatable reads." Under the "Repeatable Read" level, shared locks are held until the transaction ends, ensuring consistent results for multiple reads within the transaction.

Step 5: Deadlock and Resolution

Deadlock occurs when multiple transactions cyclically wait for locks held by each other.

  • Scenario Simulation:

    • Transaction A holds Lock 1 and requests Lock 2.
    • Transaction B holds Lock 2 and requests Lock 1.
    • At this point, Transaction A waits for B, Transaction B waits for A, neither can proceed, forming a deadlock.
  • Database Solutions:

    1. Prevention: Request all possibly needed locks at once at the start of a transaction, or enforce a uniform locking order. However, this method reduces system throughput.
    2. Detection and Resolution (Mainstream Solution): The database allows deadlocks to occur but periodically detects them. Once a deadlock is detected, it selects a "victim" transaction (usually based on strategies like minimum rollback cost), rolls it back, and releases all its held locks, allowing other transactions to proceed. The rolled-back transaction receives an error message, and the application needs to retry it.

Summary:
The database locking mechanism is a sophisticated system. It uses the two basic tools, shared locks and exclusive locks, applied to data of different granularities like rows and tables, and follows different locking protocols (corresponding to different isolation levels) to balance data consistency and system concurrency performance. Simultaneously, it employs deadlock detection and rollback mechanisms to handle inevitable resource competition, ensuring the stable operation of the database in high-concurrency scenarios.