Database MVCC (Multi-Version Concurrency Control) Principle Analysis
I. Overview of MVCC
MVCC (Multi-Version Concurrency Control) is a database concurrency control technique that enables concurrent execution of read and write operations by maintaining multiple versions of data. Its core goal is to resolve read-write conflicts: allowing read operations not to block write operations, and write operations not to block read operations, thereby improving the database's concurrent performance.
II. Core Mechanisms of MVCC
-
Version Chain and Hidden Fields
Each row of data in the database implicitly adds three key fields:DB_TRX_ID(6 bytes): The transaction ID that most recently modified this row of data.DB_ROLL_PTR(7 bytes): A rollback pointer, pointing to the location of the old version data in the undo log.DB_ROW_ID(6 bytes): An implicit auto-increment ID (generated for the clustered index if the table has no primary key).
-
ReadView Mechanism
Each transaction generates a ReadView (read snapshot) upon execution, containing:m_ids: The set of currently active (uncommitted) transaction IDs.min_trx_id: The minimum transaction ID within m_ids.max_trx_id: The next transaction ID to be assigned.creator_trx_id: The transaction ID that created this ReadView.
III. MVCC Visibility Determination Process
When querying data, the system determines version visibility through the following steps:
- Starting from the latest data, traverse each version along the version chain.
- Check the relationship between the transaction ID (trx_id) corresponding to the version and the current ReadView:
- If trx_id == creator_trx_id → Data modified by the current transaction, visible.
- If trx_id < min_trx_id → Transaction already committed, visible.
- If trx_id > max_trx_id → Modified by a future transaction, not visible.
- If trx_id ∈ m_ids → Transaction uncommitted, not visible; otherwise, visible.
- Return the first version that meets the visibility conditions.
IV. Example Demonstration
Assume transaction T5 (transaction ID=5) executes a query:
- ReadView: m_ids={3,4}, min_trx_id=3, max_trx_id=6
- Data row version chain: T3(uncommitted) ← T2(committed) ← T1(committed)
- Determination process:
- Version T3: trx_id=3 ∈ m_ids → Not visible
- Version T2: trx_id=2 < min_trx_id=3 → Visible → Return data from T2 version
V. MVCC and Isolation Levels
- Read Committed (RC): Generates a new ReadView for each query, avoiding dirty reads but potentially causing non-repeatable reads.
- Repeatable Read (RR): Generates a ReadView upon the first query and reuses it, ensuring consistent data across multiple reads.
VI. Advantages and Limitations of MVCC
Advantages:
- Read and write operations do not block each other.
- Avoids lock contention, improving concurrent performance.
- Naturally resolves dirty read problems.
Limitations:
- Requires maintaining version chains and undo logs, increasing storage overhead.
- Long-running uncommitted transactions may lead to excessively long version chains.