Union-Find Data Structure
Union-Find is a data structure used for handling the merging and querying of disjoint sets. It supports two primary operations: merging two sets (Union) and querying which set an element belongs to (Find). Its core applications include determining whether two nodes in a graph are connected, minimum spanning tree algorithms (such as Kruskal's algorithm), and more.
1. Basic Concepts and Representation
Imagine several elements, each initially belonging to its own independent set. We can represent a set as a tree, with the root node serving as the "representative" of the set. The core idea of Union-Find is: to determine if two elements belong to the same set, we only need to check if their root nodes are the same.
Initially, with n elements (e.g., 0 to n-1), we can use an array parent of length n to represent the structure. parent[i] stores the parent node of element i. Initially, each element is independent, so parent[i] = i, meaning each element is the root of its own set.
2. Core Operation: Find (Query)
The goal of the Find operation is to locate the root node (representative) of the set containing a given element.
Steps:
- Input an element
x. - Starting from
x, traverse upward following theparentarray, i.e., repeatedly accessx = parent[x]. - Continue until finding an element where
parent[x] == x; thisxis the root node. - Return this root node.
Example:
Assume we have a set {0, 1, 2, 3}, initial parent = [0, 1, 2, 3].
Find(0)-> Path: 0 -> 0, returns 0.Find(3)-> Path: 3 -> 3, returns 3.
3. Core Operation: Union (Merge)
The goal of the Union operation is to merge the sets containing two elements into a single set.
Basic Steps:
- Find the root nodes of element
xand elementy(using the Find operation), denoted asrootXandrootY. - If
rootX == rootY, it meansxandyare already in the same set; no merge is needed. - If
rootX != rootY, set the parent of one root to point to the other root. For example, setparent[rootY] = rootX. This merges the two trees into one.
Example:
Initial parent = [0, 1, 2, 3].
- Execute
Union(1, 2):Find(1)returns 1,Find(2)returns 2.- Set
parent[2] = 1. Nowparent = [0, 1, 1, 3]. Sets become{0},{1, 2},{3}.
- Then execute
Union(2, 3):Find(2): 2 ->parent[2]=1-> 1 (parent[1]=1), returns 1.Find(3)returns 3.- Set
parent[3] = 1. Nowparent = [0, 1, 1, 1]. Sets become{0},{1, 2, 3}.
4. Optimization Strategy: Path Compression
In the basic Find operation, if the tree becomes very tall (e.g., degenerates into a chain), each Find operation may need to traverse the entire path, leading to low efficiency (worst-case O(n)).
Path Compression is an optimization technique. During the Find operation, it simultaneously redirects the parent pointers of all nodes along the path to point directly to the root node. This makes subsequent Find operations for nodes on the same path very fast.
Steps for Optimized Find Operation:
- If
parent[x] != x,xis not the root. - Recursively (or iteratively) find the root node
root. - During the recursive return, set the parent of each node
xalong the path directly toroot(parent[x] = root). - Return
root.
Example:
Assume parent = [0, 1, 2, 3], structure is 0 <- 1 <- 2 <- 3.
- Execute
Find(3)(optimized):Find(3)callsFind(2).Find(2)callsFind(1).Find(1)callsFind(0); 0 is the root, returns 0.- Before
Find(1)returns, setparent[1] = 0. - Before
Find(2)returns, setparent[2] = 0. - Before
Find(3)returns, setparent[3] = 0.
- After operation,
parentbecomes[0, 0, 0, 0]. All nodes now point directly to root 0.
5. Optimization Strategy: Union by Rank
In the basic Union operation, we arbitrarily attach one tree to another. This can cause the tree height to grow rapidly, potentially degenerating into a chain. Union by Rank is another optimization that always attaches the "smaller" tree (tree with lower height/rank) to the root of the "larger" tree (tree with higher height/rank) to control height growth.
Implementation Method:
- We need an additional array
rank, whererank[i]records an approximate value of the height (rank) of the tree rooted ati. - Initially, each element's
rankis 0. - During union, compare the
rankof the two roots:- If
rank[rootX] > rank[rootY], setparent[rootY] = rootX. - If
rank[rootX] < rank[rootY], setparent[rootX] = rootY. - If
rank[rootX] == rank[rootY], attach either tree to the other arbitrarily, and increment therankof the new root by 1 (since the trees have equal height, merging increases height by 1).
- If
Example:
Initial parent = [0, 1, 2, 3], rank = [0, 0, 0, 0].
Union(1, 2):rank[1]==0,rank[2]==0, equal. Suppose we attach 2 to 1,parent[2]=1.rank[1]increments to 1.parent = [0, 1, 1, 3],rank = [0, 1, 0, 0].Union(2, 3):Find(2)finds root 1 (rank=1),Find(3)finds root 3 (rank=0).rank[1] > rank[3], attach 3 to 1,parent[3]=1.rankunchanged.parent = [0, 1, 1, 1],rank = [0, 1, 0, 0].
6. Complete Implementation and Complexity Analysis
Combining Path Compression and Union by Rank, the Union-Find structure becomes highly efficient.
Pseudocode/Simplified Implementation:
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
# Path compression: recursively find root and point directly to it
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
rootX = self.find(x)
rootY = self.find(y)
if rootX == rootY:
return
# Union by rank
if self.rank[rootX] < self.rank[rootY]:
self.parent[rootX] = rootY
elif self.rank[rootX] > self.rank[rootY]:
self.parent[rootY] = rootX
else:
# Ranks equal, attach arbitrarily, increment new root's rank
self.parent[rootY] = rootX
self.rank[rootX] += 1
Complexity Analysis:
- When using both path compression and union by rank, the average (amortized) time complexity per operation for Union-Find is nearly O(α(n)).
- Here, α(n) is the inverse Ackermann function, which grows extremely slowly. For all practical purposes (where n is far less than the number of atoms in the universe), α(n) is typically less than 5.
- Therefore, Union-Find operations can be considered nearly constant time.
Summary:
The Union-Find data structure represents sets using a tree-like structure, with core operations being Find and Union. Through the two major optimizations of path compression and union by rank, its efficiency becomes extremely high, making it a powerful tool for solving problems like dynamic connectivity. Understanding its principles and optimization strategies is key to mastering this concept.