Database MVCC (Multi-Version Concurrency Control) Principle Analysis

Database MVCC (Multi-Version Concurrency Control) Principle Analysis

I. Overview of MVCC
MVCC (Multi-Version Concurrency Control) is a database concurrency control technique that enables concurrent execution of read and write operations by maintaining multiple versions of data. Its core goal is to resolve read-write conflicts: allowing read operations not to block write operations, and write operations not to block read operations, thereby improving the database's concurrent performance.

II. Core Mechanisms of MVCC

  1. Version Chain and Hidden Fields
    Each row of data in the database implicitly adds three key fields:

    • DB_TRX_ID (6 bytes): The transaction ID that most recently modified this row of data.
    • DB_ROLL_PTR (7 bytes): A rollback pointer, pointing to the location of the old version data in the undo log.
    • DB_ROW_ID (6 bytes): An implicit auto-increment ID (generated for the clustered index if the table has no primary key).
  2. ReadView Mechanism
    Each transaction generates a ReadView (read snapshot) upon execution, containing:

    • m_ids: The set of currently active (uncommitted) transaction IDs.
    • min_trx_id: The minimum transaction ID within m_ids.
    • max_trx_id: The next transaction ID to be assigned.
    • creator_trx_id: The transaction ID that created this ReadView.

III. MVCC Visibility Determination Process
When querying data, the system determines version visibility through the following steps:

  1. Starting from the latest data, traverse each version along the version chain.
  2. Check the relationship between the transaction ID (trx_id) corresponding to the version and the current ReadView:
    • If trx_id == creator_trx_id → Data modified by the current transaction, visible.
    • If trx_id < min_trx_id → Transaction already committed, visible.
    • If trx_id > max_trx_id → Modified by a future transaction, not visible.
    • If trx_id ∈ m_ids → Transaction uncommitted, not visible; otherwise, visible.
  3. Return the first version that meets the visibility conditions.

IV. Example Demonstration
Assume transaction T5 (transaction ID=5) executes a query:

  • ReadView: m_ids={3,4}, min_trx_id=3, max_trx_id=6
  • Data row version chain: T3(uncommitted) ← T2(committed) ← T1(committed)
  • Determination process:
    • Version T3: trx_id=3 ∈ m_ids → Not visible
    • Version T2: trx_id=2 < min_trx_id=3 → Visible → Return data from T2 version

V. MVCC and Isolation Levels

  • Read Committed (RC): Generates a new ReadView for each query, avoiding dirty reads but potentially causing non-repeatable reads.
  • Repeatable Read (RR): Generates a ReadView upon the first query and reuses it, ensuring consistent data across multiple reads.

VI. Advantages and Limitations of MVCC
Advantages:

  • Read and write operations do not block each other.
  • Avoids lock contention, improving concurrent performance.
  • Naturally resolves dirty read problems.

Limitations:

  • Requires maintaining version chains and undo logs, increasing storage overhead.
  • Long-running uncommitted transactions may lead to excessively long version chains.