Database Logging System and Recovery Mechanism
Description
The database logging system and recovery mechanism are core components that ensure data consistency and durability. When a database fails (e.g., power outage, system crash), the logging system records the operation history, enabling the database to be restored to a consistent state. Key issues include: What content does the log record? How to ensure logs are written prior to data? How to use logs for recovery after a failure?
1. Basic Content and Types of Logs
Logs are sequentially appended files that record all change operations in the database. Each log entry contains:
- Transaction ID: Uniquely identifies a transaction.
- Operation Type: Such as "insert", "delete", "update".
- Data Item Identifier: The location of the modified data (e.g., page ID + offset).
- Old Value (Before Image): The data value before modification (used for rollback).
- New Value (After Image): The data value after modification (used for redo).
Logs are divided into two types:
- REDO Log: Records the new values after a transaction commits, used to reapply committed modifications.
- UNDO Log: Records the old values before a transaction modifies data, used to roll back uncommitted transactions.
2. Log Writing Rule: Write-Ahead Logging (WAL)
To guarantee durability, databases adopt the Write-Ahead Logging (WAL) rule:
Before any data page is modified, its corresponding log record must first be persisted to disk.
Why is WAL needed?
- If data pages are modified first but the log is not written before a crash, the modifications cannot be recovered.
- WAL ensures the log always contains a complete history of operations, allowing reconstruction even if data pages are lost.
Implementation Method:
- When a transaction modifies data, it first generates a log record in the memory log buffer.
- Force the log buffer to be flushed to disk (
fsync), ensuring log persistence. - Then write the data page modifications to the memory data buffer.
- The data buffer is asynchronously flushed to disk by a background process.
3. Core Process of the Recovery Mechanism
Failure recovery is divided into three phases:
Phase One: Analysis Phase
- Scan the log to determine which transactions were uncommitted at the time of the crash (need rollback) and which were committed but their data might not have been flushed (need redo).
- Establish two sets:
- Redo Set: All committed transactions or uncommitted transactions with complete logs.
- Undo Set: All uncommitted transactions.
Phase Two: Redo Phase
- Starting from the most recent checkpoint, replay all operations in the redo set:
- For each log entry, if the current version of the data page is older than the log (compared via LSN), reapply the new value.
- Purpose: Ensure all modifications of committed transactions are persisted, even if data pages were not flushed.
Phase Three: Undo Phase
- Scan the log backwards and roll back transactions in the undo set:
- Overwrite data pages with the old values from the logs and write a "compensation log" record for the rollback operation.
- Purpose: Eliminate partial modifications of uncommitted transactions to guarantee atomicity.
4. Checkpoint Optimization
Scanning the entire log for every recovery is highly inefficient. The checkpoint mechanism is executed periodically:
- Suspend new transactions and wait for all current transactions to complete.
- Force flush all dirty data pages (modified but not flushed) to disk.
- Write a checkpoint record in the log, recording the list of currently active transactions.
During Recovery: Only need to scan the log from the most recent checkpoint, reducing the amount of log processing.
5. Types of Logs in Practical Applications
- Physical Log: Records specific byte changes in data pages (e.g., "write value X at offset 100 of page P"), offering high recovery efficiency.
- Logical Log: Records SQL operations (e.g., "UPDATE T SET col=1 WHERE id=5"), saving space but potentially complicating recovery.
Modern databases (e.g., MySQL InnoDB) often combine both: physical logs for redo and logical logs for undo.
Summary
Through the WAL rule and recovery process, the logging system can, after a failure:
- Redo committed transactions (ensuring durability).
- Undo uncommitted transactions (ensuring atomicity).
- The checkpoint mechanism balances recovery performance with runtime overhead.