Basic Principles and Operations of B-Tree

Basic Principles and Operations of B-Tree

Problem Description
B-Tree is a self-balancing multi-way search tree widely used in databases and file systems. Unlike binary search trees, each node in a B-Tree can contain multiple keys and multiple child node pointers, which allows B-Trees to maintain a relatively low height, thereby reducing the number of disk I/O operations. You need to understand the core properties of B-Trees and the processes of search, insertion, and deletion operations.

Core Properties

Each node has at most m child nodes, called an m-order B-Tree.
The root node has at least 2 child nodes (unless it is a leaf node).
Each non-root, non-leaf node has at least ⌈m/2⌉ child nodes.
All leaf nodes appear at the same level.
Each non-leaf node contains k keys and k+1 child pointers, where ⌈m/2⌉-1 ≤ k ≤ m-1.

Search Operation
The search process is similar to that of a binary search tree, but requires sequential comparison of multiple keys within each node:

Start from the root node and sequentially search for the target key within the current node.
If the target key is found, the search is successful.
If not found, determine the subtree interval where the target key may exist.
Recursively continue the search in the corresponding subtree.
If the leaf node is reached and the key is still not found, the search fails.

Insertion Operation
The insertion operation needs to maintain the balancing properties of the B-Tree:

First, perform a search operation to find the appropriate position in a leaf node.
Insert the new key into the leaf node (maintaining the order of keys).
If the number of keys in the node does not exceed m-1 after insertion, the insertion is complete.
If the node is full (number of keys = m), a split is required:
- Promote the middle key to the parent node.
- Split the original node into two nodes, containing the keys to the left and right of the middle key, respectively.
If the parent node also becomes full as a result, continue splitting upward, potentially affecting the root node.

Example: Insertion Process for a 3-Order B-Tree
Assume keys are inserted in order: 10, 20, 30, 40, 50

Insert 10, 20: Root node [10, 20]
Insert 30: Root node is full, split into:
Root node [20]
Left child node [10], Right child node [30]
Insert 40: Insert into right child node [30, 40]
Insert 50: Right child node is full, split into:
Root node [20, 40]
Left child node [10], Middle child node [30], Right child node [50]

Deletion Operation
The deletion operation is relatively complex and requires handling multiple scenarios:
Scenario 1: Deleting a key from a leaf node

Delete directly. If the number of keys in the node after deletion is still ≥ ⌈m/2⌉-1, the operation is complete.
Otherwise, it is necessary to borrow a key from a sibling node or merge nodes.

Scenario 2: Deleting a key from a non-leaf node

Replace the key to be deleted with its predecessor or successor key.
Transform the problem into deleting a key from a leaf node.

Node Merging and Redistribution
When a node has insufficient keys:

If a sibling node has surplus keys, redistribute keys through the parent node.
If the sibling node also has no surplus keys, merge with the sibling node.
Merging may reduce the number of keys in the parent node, potentially requiring further upward adjustments.

Advantages of B-Tree

Short and wide tree structure reduces disk access times.
Each node stores multiple keys, making full use of disk block size.
Automatically maintains balance, ensuring stable operational efficiency.
Suitable for external storage scenarios involving large-scale data.

By understanding the structural characteristics and operational rules of B-Trees, you can master the working principles of this crucial data structure in database indexing.