The MVCC Mechanism and Implementation Principles of Databases

The MVCC Mechanism and Implementation Principles of Databases

Description
MVCC (Multi-Version Concurrency Control) is a database concurrency control technology. By preserving multiple versions of data during modifications, it enables concurrent execution of read and write operations without the need for locking and blocking. It is widely used in database systems such as MySQL (InnoDB) and PostgreSQL, aiming to enhance performance in high-concurrency scenarios.

I. Core Concepts of MVCC

  1. Multi-Version Data Storage: When modifying data, instead of directly overwriting the original data, a new version is generated, and the old version is retained in the system.
  2. Transaction Visibility Rules: Each transaction can only see data versions that were committed before it started, avoiding reads of uncommitted data or intermediate states during concurrent modifications.
  3. Version Chain Management: Each data row is associated with multiple versions through pointers, forming a version chain. Transactions access specific versions on the chain according to rules.

II. Key Components of MVCC

  1. Transaction ID

    • Each transaction is assigned a unique ID upon startup, which increments in chronological order.
    • Used to mark the creator of a data version and as a basis for visibility judgment.
  2. Hidden Fields of Data Rows

    • DB_TRX_ID: The ID of the transaction that most recently modified this row.
    • DB_ROLL_PTR: A pointer to the old version of the data (rollback pointer).
    • DB_ROW_ID (optional): Unique row identifier (automatically generated when there is no primary key).
  3. Read View

    • A snapshot of the current system state generated when a transaction performs a snapshot read (e.g., SELECT), including:
      • m_ids: The set of currently uncommitted transaction IDs.
      • min_trx_id: The minimum transaction ID in m_ids.
      • max_trx_id: The next transaction ID to be assigned.
      • creator_trx_id: The ID of the transaction that created this Read View.

III. Visibility Judgment Process of MVCC
When a transaction queries data, it traverses the version chain and checks visibility for each version:

  1. If the DB_TRX_ID of the row version equals creator_trx_id, it means the data was modified by the transaction itself, visible.
  2. If DB_TRX_ID is less than min_trx_id, it means the version was committed before the current transaction started, visible.
  3. If DB_TRX_ID is greater than or equal to max_trx_id, it means the version was modified by a future transaction, not visible.
  4. If DB_TRX_ID is in m_ids, it means the transaction that modified this version is uncommitted, not visible; otherwise, visible.

IV. Specific Implementation of MVCC in InnoDB

  1. Version Chain Storage

    • Old versions of data are stored in the Rollback Segment, with DB_ROLL_PTR pointing to historical versions.
    • The Rollback Segment works in conjunction with the Redo Log to ensure consistency during crash recovery.
  2. Types of Read Operations

    • Snapshot Read: Uses a Read View to read historical versions (e.g., a regular SELECT).
    • Current Read: Reads the latest committed data and requires locking (e.g., SELECT ... FOR UPDATE).
  3. Purge Mechanism

    • Periodically cleans up old versions of data that are no longer needed by any transaction, freeing up storage space.
    • Determines whether to clean up by checking if the old version is covered by the Read Views of all active transactions.

V. Advantages and Disadvantages of MVCC

  • Advantages:
    • Read and write operations do not block each other, improving concurrency performance.
    • Avoids dirty reads and non-repeatable reads (depending on the isolation level).
  • Disadvantages:
    • Requires additional storage space to maintain version chains.
    • Frequent updates may lead to excessively long version chains, increasing query overhead.

VI. Example Illustration
Assume transaction T1 (ID=100) modifies row R, generating a new version R1 (DB_TRX_ID=100). At this point, transaction T2 (ID=101) starts and queries:

  • If T1 is uncommitted, T2's Read View contains m_ids=[100], making R1 invisible. T2 will read the old version of R.
  • If T1 is committed, T2's Read View does not contain 100, making R1 visible.

Through the above mechanisms, MVCC ensures data consistency while significantly reducing lock contention, making it a cornerstone of modern database design for high concurrency.