Basic Principles and Operations of B-Tree
Problem Description
B-Tree is a self-balancing multi-way search tree widely used in databases and file systems. Unlike binary search trees, each node in a B-Tree can contain multiple keys and multiple child node pointers, which allows B-Trees to maintain a relatively low height, thereby reducing the number of disk I/O operations. You need to understand the core properties of B-Trees and the processes of search, insertion, and deletion operations.
Core Properties
- Each node has at most m child nodes, called an m-order B-Tree.
- The root node has at least 2 child nodes (unless it is a leaf node).
- Each non-root, non-leaf node has at least ⌈m/2⌉ child nodes.
- All leaf nodes appear at the same level.
- Each non-leaf node contains k keys and k+1 child pointers, where ⌈m/2⌉-1 ≤ k ≤ m-1.
Search Operation
The search process is similar to that of a binary search tree, but requires sequential comparison of multiple keys within each node:
- Start from the root node and sequentially search for the target key within the current node.
- If the target key is found, the search is successful.
- If not found, determine the subtree interval where the target key may exist.
- Recursively continue the search in the corresponding subtree.
- If the leaf node is reached and the key is still not found, the search fails.
Insertion Operation
The insertion operation needs to maintain the balancing properties of the B-Tree:
- First, perform a search operation to find the appropriate position in a leaf node.
- Insert the new key into the leaf node (maintaining the order of keys).
- If the number of keys in the node does not exceed m-1 after insertion, the insertion is complete.
- If the node is full (number of keys = m), a split is required:
- Promote the middle key to the parent node.
- Split the original node into two nodes, containing the keys to the left and right of the middle key, respectively.
- If the parent node also becomes full as a result, continue splitting upward, potentially affecting the root node.
Example: Insertion Process for a 3-Order B-Tree
Assume keys are inserted in order: 10, 20, 30, 40, 50
- Insert 10, 20: Root node [10, 20]
- Insert 30: Root node is full, split into:
Root node [20]
Left child node [10], Right child node [30] - Insert 40: Insert into right child node [30, 40]
- Insert 50: Right child node is full, split into:
Root node [20, 40]
Left child node [10], Middle child node [30], Right child node [50]
Deletion Operation
The deletion operation is relatively complex and requires handling multiple scenarios:
Scenario 1: Deleting a key from a leaf node
- Delete directly. If the number of keys in the node after deletion is still ≥ ⌈m/2⌉-1, the operation is complete.
- Otherwise, it is necessary to borrow a key from a sibling node or merge nodes.
Scenario 2: Deleting a key from a non-leaf node
- Replace the key to be deleted with its predecessor or successor key.
- Transform the problem into deleting a key from a leaf node.
Node Merging and Redistribution
When a node has insufficient keys:
- If a sibling node has surplus keys, redistribute keys through the parent node.
- If the sibling node also has no surplus keys, merge with the sibling node.
- Merging may reduce the number of keys in the parent node, potentially requiring further upward adjustments.
Advantages of B-Tree
- Short and wide tree structure reduces disk access times.
- Each node stores multiple keys, making full use of disk block size.
- Automatically maintains balance, ensuring stable operational efficiency.
- Suitable for external storage scenarios involving large-scale data.
By understanding the structural characteristics and operational rules of B-Trees, you can master the working principles of this crucial data structure in database indexing.