Minimum Spanning Tree: Kruskal's Algorithm
Kruskal's algorithm is a greedy algorithm used to find a Minimum Spanning Tree (MST) in a weighted, connected graph. A minimum spanning tree is an acyclic subgraph that connects all vertices, with the sum of the weights of its edges being the smallest possible.
Algorithm Concept:
The core idea of Kruskal's algorithm is: Sort all edges in ascending order of their weight. Then, consider each edge in sequence. If adding this edge to the growing tree does not create a cycle, include it; otherwise, discard it. Repeat this process until the tree contains V-1 edges (where V is the number of vertices).
Detailed Steps:
- Sort: Sort all edges of the graph in non-decreasing order of their weight.
- Initialize: Create a Union-Find (or Disjoint Set) data structure to efficiently determine if vertices are connected. Initially, each vertex forms its own separate set.
- Select Edges: Process each edge (
u,v,weight) in the sorted order.- Check Connectivity: Use the
Findoperation of the Union-Find structure to check if the endpointsuandvof the edge belong to the same set (i.e., are already connected). - Decision:
- If
Find(u) != Find(v): Adding this edge will not create a cycle. PerformUnion(u, v)to merge the sets containinguandv, and add this edge to the set of edges forming the MST. - If
Find(u) == Find(v):uandvare already connected via edges selected earlier. Adding the current edge would create a cycle, so discard it.
- If
- Check Connectivity: Use the
- Termination Condition: Repeat Step 3 until the number of edges in the MST equals
V-1(whereVis the total number of vertices). At this point, all vertices are connected, and the algorithm terminates.
Concrete Example:
Consider the following weighted, connected graph (vertices: A, B, C, D):
A
1/ | \2
B 3 C
4\ | /5
D
Edge list ([vertex1, vertex2, weight]):
[A, B, 1], [A, C, 2], [A, D, 3], [B, D, 4], [C, D, 5]
Algorithm Execution:
-
Sort: Sort edges by weight.
[A, B, 1][A, C, 2][A, D, 3][B, D, 4][C, D, 5]
-
Initialize Union-Find:
Set1: {A},Set2: {B},Set3: {C},Set4: {D}
-
Process edge
[A, B, 1]:Find(A) -> A,Find(B) -> B. Results are different, meaning they are not connected.- Perform
Union(A, B), merging sets{A}and{B}. Add edgeA-Bto the MST. - Set state:
Set1: {A, B},Set2: {C},Set3: {D} - MST Edges: {A-B}, Total Weight = 1
-
Process edge
[A, C, 2]:Find(A) -> A(orB, as they are in the same set),Find(C) -> C. Different, not connected.- Perform
Union(A, C), merging{A, B}and{C}. Add edgeA-Cto the MST. - Set state:
Set1: {A, B, C},Set2: {D} - MST Edges: {A-B, A-C}, Total Weight = 3
-
Process edge
[A, D, 3]:Find(A) -> A,Find(D) -> D. Different, not connected.- Perform
Union(A, D), merging{A, B, C}and{D}. Add edgeA-Dto the MST. - Set state:
Set1: {A, B, C, D} - MST Edges: {A-B, A-C, A-D}, Total Weight = 6
At this point, the MST has 4-1=3 edges, so the algorithm finishes. The final minimum spanning tree has a total weight of 1+2+3=6.
Note: If we were to process the next edge [B, D, 4], we would find Find(B)=A and Find(D)=A. Both vertices are already in the same set, so adding this edge would create a cycle, and it is discarded.
Key Points and Complexity:
- Role of Union-Find: Its
FindandUnionoperations (often optimized with path compression and union by rank) have a time complexity close to constantO(α(N)), whereαis the inverse Ackermann function, which grows extremely slowly. This makes Kruskal's algorithm very efficient. - Time Complexity: The algorithm's time complexity is dominated by the sorting step. Sorting
Eedges has a complexity ofO(E log E). For dense graphs (whereE ≈ V^2),O(E log E)approximates toO(E log V). After sorting, processing each edge with the Union-Find structure takes nearly constant time (O(α(V)) ≈ O(1)), leading to an overall complexity ofO(E log E). - Space Complexity: Primarily needed for storing edges (
O(E)) and the Union-Find structure (O(V)), totalingO(E + V). - Applicability: Kruskal's algorithm performs exceptionally well for relatively sparse graphs because its complexity is directly related to the number of edges
E. Compared to Prim's algorithm, which is more suited for dense graphs, Kruskal's algorithm is often simpler and more efficient for sparse graphs.
Summary:
Kruskal's algorithm is a greedy, edge-based algorithm. Its elegance lies in efficiently detecting cycles using the Union-Find data structure. Starting from the smallest-weight edge, it incrementally builds a forest that eventually merges into a single tree, guaranteeing that the total weight of the final spanning tree is minimized. Remember its core steps: Sort -> Use Union-Find to check for cycles -> Add edge if no cycle is formed.