Minimum Spanning Tree: Kruskal's Algorithm

Minimum Spanning Tree: Kruskal's Algorithm

Kruskal's algorithm is a greedy algorithm used to find a Minimum Spanning Tree (MST) in a weighted, connected graph. A minimum spanning tree is an acyclic subgraph that connects all vertices, with the sum of the weights of its edges being the smallest possible.

Algorithm Concept:

The core idea of Kruskal's algorithm is: Sort all edges in ascending order of their weight. Then, consider each edge in sequence. If adding this edge to the growing tree does not create a cycle, include it; otherwise, discard it. Repeat this process until the tree contains V-1 edges (where V is the number of vertices).

Detailed Steps:

Sort: Sort all edges of the graph in non-decreasing order of their weight.
Initialize: Create a Union-Find (or Disjoint Set) data structure to efficiently determine if vertices are connected. Initially, each vertex forms its own separate set.
Select Edges: Process each edge (u, v, weight) in the sorted order.
- Check Connectivity: Use the Find operation of the Union-Find structure to check if the endpoints u and v of the edge belong to the same set (i.e., are already connected).
- Decision:
  - If Find(u) != Find(v): Adding this edge will not create a cycle. Perform Union(u, v) to merge the sets containing u and v, and add this edge to the set of edges forming the MST.
  - If Find(u) == Find(v): u and v are already connected via edges selected earlier. Adding the current edge would create a cycle, so discard it.
Termination Condition: Repeat Step 3 until the number of edges in the MST equals V-1 (where V is the total number of vertices). At this point, all vertices are connected, and the algorithm terminates.

Concrete Example:

Consider the following weighted, connected graph (vertices: A, B, C, D):

    A
 1/ | \2
B   3  C
 4\ | /5
    D

Edge list ([vertex1, vertex2, weight]):
[A, B, 1], [A, C, 2], [A, D, 3], [B, D, 4], [C, D, 5]

Algorithm Execution:

Sort: Sort edges by weight.
- [A, B, 1]
- [A, C, 2]
- [A, D, 3]
- [B, D, 4]
- [C, D, 5]
Initialize Union-Find:
- Set1: {A}, Set2: {B}, Set3: {C}, Set4: {D}
Process edge [A, B, 1]:
- Find(A) -> A, Find(B) -> B. Results are different, meaning they are not connected.
- Perform Union(A, B), merging sets {A} and {B}. Add edge A-B to the MST.
- Set state: Set1: {A, B}, Set2: {C}, Set3: {D}
- MST Edges: {A-B}, Total Weight = 1
Process edge [A, C, 2]:
- Find(A) -> A (or B, as they are in the same set), Find(C) -> C. Different, not connected.
- Perform Union(A, C), merging {A, B} and {C}. Add edge A-C to the MST.
- Set state: Set1: {A, B, C}, Set2: {D}
- MST Edges: {A-B, A-C}, Total Weight = 3
Process edge [A, D, 3]:
- Find(A) -> A, Find(D) -> D. Different, not connected.
- Perform Union(A, D), merging {A, B, C} and {D}. Add edge A-D to the MST.
- Set state: Set1: {A, B, C, D}
- MST Edges: {A-B, A-C, A-D}, Total Weight = 6

At this point, the MST has 4-1=3 edges, so the algorithm finishes. The final minimum spanning tree has a total weight of 1+2+3=6.

Note: If we were to process the next edge [B, D, 4], we would find Find(B)=A and Find(D)=A. Both vertices are already in the same set, so adding this edge would create a cycle, and it is discarded.

Key Points and Complexity:

Role of Union-Find: Its Find and Union operations (often optimized with path compression and union by rank) have a time complexity close to constant O(α(N)), where α is the inverse Ackermann function, which grows extremely slowly. This makes Kruskal's algorithm very efficient.
Time Complexity: The algorithm's time complexity is dominated by the sorting step. Sorting E edges has a complexity of O(E log E). For dense graphs (where E ≈ V^2), O(E log E) approximates to O(E log V). After sorting, processing each edge with the Union-Find structure takes nearly constant time (O(α(V)) ≈ O(1)), leading to an overall complexity of O(E log E).
Space Complexity: Primarily needed for storing edges (O(E)) and the Union-Find structure (O(V)), totaling O(E + V).
Applicability: Kruskal's algorithm performs exceptionally well for relatively sparse graphs because its complexity is directly related to the number of edges E. Compared to Prim's algorithm, which is more suited for dense graphs, Kruskal's algorithm is often simpler and more efficient for sparse graphs.

Summary:
Kruskal's algorithm is a greedy, edge-based algorithm. Its elegance lies in efficiently detecting cycles using the Union-Find data structure. Starting from the smallest-weight edge, it incrementally builds a forest that eventually merges into a single tree, guaranteeing that the total weight of the final spanning tree is minimized. Remember its core steps: Sort -> Use Union-Find to check for cycles -> Add edge if no cycle is formed.