Skip List Principles and Implementation

Skip List Principles and Implementation

A skip list is an efficient data structure based on a sorted linked list. It accelerates search operations by adding multi-level indexes, enabling average O(log n) time complexity for search, insertion, and deletion operations, and is simpler to implement than balanced trees.

I. Basic Structure: Limitations of Sorted Linked Lists

Each node in a standard sorted linked list only stores a pointer to the next node.
Searching requires traversal from head to tail, with a time complexity of O(n).
Insertion and deletion operations only need O(1) time to modify pointers, but finding the correct position requires O(n) time.

II. Core Idea of Skip Lists: Building Multi-level Indexes

The first level is the original sorted linked list containing all elements.
The second-level index is formed by "skipping" and extracting some nodes from the first level (e.g., extracting every other node).
The third-level index is similarly extracted by skipping nodes from the second level, and so on.
The highest-level index typically has only 2-4 nodes, forming a "pyramid" structure.

III. Implementation Steps

Search Operation Process:

Start the search from the highest-level index.
Traverse right at the current level until the value of the next node is greater than or equal to the target value.
If the current node's value equals the target value, the search is successful.
If the next node's value is greater than the target value, descend one level in the index.
Repeat steps 2-4 until the target is found or it is not found after reaching the lowest level.

Insertion Operation Process:

Use the search operation to find the insertion position, recording the predecessor nodes at each level along the search path.
Create a new node and randomly determine its level height (commonly using the coin toss method: 50% probability to increase a level).
Update pointers from low to high levels:
- The new node's next pointer points to the original successor node.
- The next pointers of the predecessor nodes at each level point to the new node.
If the random level exceeds the current maximum level, update the skip list's maximum level.

Deletion Operation Process:

Use the search operation to find the target node while recording the predecessor nodes at each level.
Starting from the highest level, update the next pointers of the predecessor nodes at each level to skip the node being deleted.
If a level becomes empty after deletion, the skip list's maximum level can be reduced.

IV. Key Techniques and Complexity Analysis

Level Randomization Algorithm:

def random_level():
    level = 1
    while random.random() < 0.5 and level < MAX_LEVEL:
        level += 1
    return level

The level of each node is generated independently and randomly.
The expected number of levels is 2, ensuring the index size is controllable.

Time Complexity Analysis:

Average time complexity for search/insertion/deletion: O(log n).
Worst-case scenario (all nodes on the same level): O(n).
Time and space efficiency can be balanced by adjusting the random probability parameter.

Space Complexity Analysis:

Additional index space: O(n) (expecting every 2 nodes to have 1 index, every 4 nodes to have a 2nd-level index, forming a geometric series).
Actual space consumption is approximately twice that of the original linked list.

V. Advantages Compared to Balanced Trees

Simple implementation, with significantly less code than Red-Black Trees or AVL Trees.
Supports range queries with efficiency comparable to balanced trees.
Easier to implement thread-safe versions in concurrent environments.
Practical systems like Redis use skip lists to implement sorted sets.

This design approach, which relies on probabilistic balance rather than strict balance, makes skip lists an excellent alternative to balanced trees in engineering practice.