Architecture Design and Data Organization Methods of Database Storage Engines

Architecture Design and Data Organization Methods of Database Storage Engines

Problem Description
The storage engine is the core component of a database management system, responsible for data storage, retrieval, and management. Different storage engines adopt different architectural designs (e.g., B+Tree, LSM-Tree) to organize data on disk, directly impacting the database's read/write performance, transaction support capabilities, and data consistency. This topic will provide an in-depth analysis of the architectural principles, data organization methods, and applicable scenarios of two mainstream storage engines.

I. Basic Functions and Core Challenges of Storage Engines

  1. Core Responsibilities:
    • Data Persistence: Reliably writing data to disk, ensuring no loss after a crash.
    • Efficient Retrieval: Quickly locating data positions through indexes.
    • Concurrency Control: Maintaining data consistency when multiple threads read and write simultaneously.
  2. Design Challenges:
    • Read/Write Performance Balance: Optimizing write operations may sacrifice read performance (e.g., LSM-Tree), and vice versa.
    • Storage Space Efficiency: Index structures may occupy extra space (e.g., intermediate nodes in B+Tree).
    • Data Recovery Capability: Ensuring data consistency after failures through logging mechanisms.

II. Architectural Principles of B+Tree Storage Engines

  1. Data Organization Method:
    • All data is stored in leaf nodes; non-leaf nodes only store key values for routing.
    • Leaf nodes are linked via pointers, supporting efficient range queries.
    • Node size is typically aligned with disk pages (e.g., 4KB) to reduce I/O operations.
  2. Write Process:
    • When inserting data, locate the corresponding leaf node. If the node is full, it splits:
      Original node [10, 20, 30] (capacity 3)  
      Insert 25 → Split into [10, 20] and [25, 30], with the middle value 20 promoted to the parent node.  
      
    • Splitting may propagate recursively upward, increasing the tree height.
  3. Read Optimization:
    • When querying data with key value K, perform a binary search from the root node down to the leaf node.
    • Tree height is typically 3-4 levels (for tens of millions of records), requiring only 3-4 disk I/O operations.

III. Architectural Principles of LSM-Tree Storage Engines

  1. Layered Data Design:
    • MemTable: Data is first written to an in-memory skip list structure, supporting high-speed writes.
    • Immutable MemTable: When the MemTable is full, it is frozen into a read-only state, ready to be flushed to disk.
    • SSTable (Sorted String Table): A multi-layered on-disk structure where data in each layer is sorted and stored in a compressed format.
  2. Write Process:
    • Write operations first append to a log (WAL) to ensure durability, then write to the MemTable.
    • When the MemTable reaches a threshold, it is switched to Immutable and asynchronously flushed to disk to generate an L0-level SSTable.
  3. Read and Compaction Process:
    • Queries require merging checks across the MemTable, Immutable MemTable, and various SSTable layers (potentially multiple I/O operations).
    • Background threads periodically merge overlapping SSTables (Compaction) to reduce read amplification.

IV. Comparison and Applicable Scenarios of the Two Engines

  1. Performance Comparison:
    Metric B+Tree LSM-Tree
    Write Throughput Random writes may trigger page splits Primarily sequential writes, high throughput
    Read Performance Stable, typically O(log n) May require multiple I/O operations (read amplification)
    Space Amplification Page splits may create free space High space utilization after compression
  2. Typical Application Scenarios:
    • B+Tree: MySQL InnoDB, relational databases, suitable for read-heavy, write-light scenarios requiring transactions.
    • LSM-Tree: LevelDB, RocksDB, NoSQL databases, suitable for write-intensive and batch import scenarios.

V. Optimization Trends in Modern Storage Engines

  1. B+Tree Optimizations:
    • Buffer Pool caches hot pages to reduce disk I/O.
    • Adaptive hash indexes accelerate equality queries.
  2. LSM-Tree Optimizations:
    • Bloom Filter quickly determines if an SSTable contains a key, reducing无效 I/O.
    • Layered Compaction strategies (Leveled/Tiered) balance write amplification and read performance.

Conclusion
The design of storage engines essentially involves balancing the trade-offs among read, write, and space costs. B+Tree ensures read stability through an ordered tree structure, while LSM-Tree optimizes write throughput via sequential writes and background merging. In practice, the choice of engine should be based on business characteristics (read/write ratio, data volume, consistency requirements) or by designing hybrid architectures that combine the strengths of both.