Detailed Explanation of Insertion and Deletion Operations in B-trees and B+ Trees

Detailed Explanation of Insertion and Deletion Operations in B-trees and B+ Trees

I. Problem Description
B-trees and B+ trees are balanced multi-way search trees, widely used in database and file systems to manage disk data efficiently. Their core feature is that each node can contain multiple keys and child nodes, reducing the number of disk I/Os by keeping the tree height low. Insertion and deletion operations must maintain the tree's balance (such as constraints on node key count ranges). This article will progressively analyze the execution logic and balancing strategies of these two operations.


II. B-tree Insertion Operation
1. Basic Rules

  • B-tree definition: A B-tree of degree \(t\) satisfies the following properties:
    • The root node contains at least one key (unless it is an empty tree).
    • A non-root node contains at least \(t-1\) keys and at most \(2t-1\) keys.
    • When a node's key count exceeds the upper limit, it must be split.

2. Insertion Steps
Step 1: Locate Insertion Position
Starting from the root, recursively descend to find the leaf node where the key \(k\) should be inserted. During the search, if a full node (key count = \(2t-1\)) is encountered, perform preemptive splitting to avoid backtracking for splits.

Step 2: Leaf Node Insertion
Insert \(k\) into the leaf node, maintaining key order (similar to insertion into a sorted array).

Step 3: Handle Node Overflow
If the node's key count after insertion is > \(2t-1\), perform a split:

  • Divide the node into three parts:
    • Left half: the first \(t-1\) keys.
    • Middle key: the \(t\)-th key.
    • Right half: the last \(t-1\) keys.
  • Promote the middle key to the parent node, and make the left and right parts two new child nodes.
  • If the parent node overflows as a result, recursively split upwards, which may increase the tree height (when the root node splits).

Example (using a B-tree with \(t=2\), key range [1,3]):
Initial tree: root node [10, 20].
Insert 5:

  1. The root node is full; first split the root node (if needed, but here insert directly into a leaf? The scenario needs clarification). Actual insertion requires searching from the root downwards. Assuming leaf node [5, 8] is full, after splitting, promote the middle key to the parent node.

III. B-tree Deletion Operation
1. Core Challenge
Deleting a key may cause a node's key count to fall below the minimum (\(t-1\)), requiring repair through borrowing or merging.

2. Deletion Steps
Case 1: Key to Delete is in a Leaf Node

  • Delete the key directly.
  • If the key count after deletion is < \(t-1\):
    • If a left or right sibling node has surplus keys (key count > \(t-1\)), borrow a key from the sibling (via the parent node).
    • If sibling nodes' key counts are both = \(t-1\), merge with one sibling node, and delete the corresponding separator key from the parent node. Merging may cause the parent's key count to become insufficient, requiring recursive upward repair.

Case 2: Key to Delete is in an Internal Node

  • Find the key's predecessor (the largest key in the left subtree) or successor (the smallest key in the right subtree), copy its value to the position of the key to be deleted, then delete the predecessor/successor (transforming the problem into a leaf node deletion).

Example (\(t=2\)):
Tree: root node [10], left child [5, 8], right child [15, 20].
Delete 10 (internal node):

  1. Replace 10 with predecessor 8, turning the problem into deleting 8 from the leaf node.
  2. After leaf node [5] deletes 8, it remains [5] (key count =1 ≥ \(t-1\)), no repair needed.

IV. Differences in B+ Tree Insertion and Deletion
1. Structural Features

  • In B+ trees, all data is stored in leaf nodes; internal nodes only store keys and pointers. Leaf nodes are linked into an ordered linked list via pointers.

2. Insertion Operation

  • Similar to B-trees, but during splitting:
    • Leaf node split: Copy the middle key to the parent node (B-trees move it), and the leaf node retains all keys.
    • Internal node split: The middle key is promoted to the parent node (same as B-trees).

3. Deletion Operation

  • If the key is in a leaf node: Delete directly. If the key count becomes insufficient, the borrowing or merging logic is the same as in B-trees, but the parent node's key must be updated (possibly replaced by the right sibling's smallest key).
  • If the key is in an internal node: After deletion, no replacement is needed (since data is only in leaf nodes), but it must be ensured that the parent node's keys can still correctly index the subtree.

V. Key Summary Points

  • Insertion Core: Preemptive splitting avoids backtracking; the middle key is promoted after a split.
  • Deletion Core: Borrowing takes priority over merging; internal node deletion is transformed into leaf node deletion.
  • B+ Tree Advantage: The leaf linked list supports range queries; internal nodes are more compact.
  • Balance Maintenance: All operations ensure node key counts satisfy \([t-1, 2t-1]\), keeping the tree height balanced.

Through the steps above, B-trees/B+ trees maintain low height during dynamic data insertion and deletion, making them suitable for disk storage scenarios.