Principles and Implementation of Database MVCC (Multi-Version Concurrency Control)
Description
MVCC (Multi-Version Concurrency Control) is a database concurrency control technique that maintains multiple versions of data to enable non-blocking concurrent read and write operations. It is widely used in database systems such as PostgreSQL and MySQL (InnoDB), aiming to improve read performance and avoid conflicts between write operations. The core idea of MVCC is: when data is modified, its old version is preserved, allowing read operations to access a snapshot before the modification without waiting for the write operation to complete.
Process Explanation
-
Basic Logic of MVCC
- Each data row includes two hidden fields: a creation version number (or timestamp) and a deletion version number (or timestamp).
- When inserting new data, the system assigns a unique, incrementing transaction ID (e.g.,
txn_id) as the creation version number, with the deletion version number initially empty. - When deleting data, it is not physically deleted immediately; instead, the current transaction ID is written to the deletion version number, marking that version as invalid.
- When updating data, it is treated as "logical deletion of the old version + insertion of a new version": first, write the current transaction ID to the deletion version number of the old row, then insert a new record with the current transaction ID as its creation version number.
-
Visibility Rules for Read Operations
Read operations determine which versions are visible based on the current transaction's ID (assumed ascurrent_txn_id) and the snapshot isolation level:- Visibility Conditions:
- The row's creation version number ≤
current_txn_id(ensuring the version existed before the transaction started). - The row's deletion version number is empty or the deletion version number >
current_txn_id(ensuring the version has not been deleted, or was deleted after the transaction started).
- The row's creation version number ≤
- If multiple versions satisfy the conditions, select the version with the largest (most recent) creation version number.
Example:
Assume transaction T1 (ID=100) inserts a row R, T2 (ID=101) updates R, and T3 (ID=102) attempts to read it:- T3 will see the new version updated by T2 because its creation version number 101 ≤ 102, and the deletion version number is empty.
- If T3 started before T2 committed (snapshot isolation), it will see the version inserted by T1 (creation version number 100 ≤ 102 and not deleted).
- Visibility Conditions:
-
Key Mechanisms for MVCC Implementation
- Transaction ID and Snapshot:
Each transaction is assigned a unique ID, and read operations are based on a system snapshot taken when the transaction starts (recording the list of currently active transactions). PostgreSQL implements this via thexmin(creation transaction ID) andxmax(deletion transaction ID) fields. - Version Chain and Rollback Segment:
Multiple versions of the same row are linked via pointers (e.g., InnoDB's undo logs). Read operations traverse the version chain, applying visibility rules to select the appropriate version. - Garbage Collection (Vacuum):
When old versions are no longer referenced by any transaction, they need to be cleaned up to free space. PostgreSQL'sVACUUMprocess removes old versions that are no longer needed by committed transactions.
- Transaction ID and Snapshot:
-
Advantages and Disadvantages of MVCC
- Advantages:
- Reads do not block writes, and writes do not block reads, making it suitable for read-heavy, write-light scenarios.
- Avoids lock contention, improving concurrency performance.
- Disadvantages:
- High storage overhead (multiple versions need to be retained).
- Requires periodic garbage collection; otherwise, space bloat may occur.
- May read stale data (businesses must tolerate "non-fresh reads" under snapshot isolation).
- Advantages:
-
Optimizations in Practice
- Index Management:
InnoDB's secondary indexes include the primary key ID. If the primary key version is updated, locating the visible version requires using "index + rollback segment." - Isolation Level Adaptation:
MVCC behaves differently under RC (Read Committed) and RR (Repeatable Read) levels: RC reads the latest committed version each time, while RR always reads the snapshot from when the transaction started.
- Index Management:
Through the above steps, MVCC balances concurrency performance and data consistency via version chains and visibility rules, becoming a cornerstone of modern database high-concurrency design.