Shortest Path Algorithms in Graphs
The shortest path problem in graphs is a classic problem in graph theory, aiming to find the path with the minimum sum of edge weights between two nodes in a graph. Among them, Dijkstra's algorithm is the most famous algorithm for solving the single-source shortest path problem in graphs with non-negative weights.
1. Problem Description and Core Idea
Assume we have a weighted directed graph (or undirected graph), where each edge has a non-negative weight (which can be understood as distance, cost, or time). Given a source node (starting point), the goal of Dijkstra's algorithm is to find the shortest path and its length from the source node to all other nodes in the graph.
Core Idea: It employs a greedy strategy. It maintains a set containing nodes for which the shortest path has already been found. The algorithm gradually expands this set. Each time, it selects the node with the smallest known distance from the source among the nodes whose shortest path hasn't been finalized, adds it to the set, and updates the distance estimates for its neighboring nodes.
2. Data Structures Required for the Algorithm
To implement the algorithm efficiently, we need the following data structures:
dist: An array (or dictionary) used to record the current known shortest distance from the source node to each node. Initially, the source node's distance is set to 0, and all other nodes' distances are set to infinity (∞).visited(orfinalized_set): A set (or Boolean array) used to mark which nodes' shortest distances have been finally determined.priority_queue: A priority queue (min-heap) used to efficiently select the currently closest node to the source that has not been visited. Elements in the queue are typically(distance, node)pairs.
3. Decomposition of Algorithm Steps
Let's break down the workflow of Dijkstra's algorithm step by step:
Step Zero: Initialization
- Create the
distdictionary/array. Setdist[source] = 0, and for all other nodesvin the graph, setdist[v] = ∞. - Create the
visitedset, initially empty. - Create the priority queue
pqand add the source node(0, source)to the queue.
Step One: Select the Current Closest Node
- Pop the node with the smallest current distance from the priority queue
pq, denoted ascurrent_node(the first pop will definitely be the source node). - Check: If
current_nodeis already in thevisitedset, skip it (this means a shorter distance for it has already been processed, and the current pop is an old, invalid record). Otherwise, proceed to the next step. - Mark
current_nodeas visited (add it to thevisitedset). At this point, the value ofdist[current_node]is the final shortest distance from the source node tocurrent_node. This choice is greedy and correct because, in a graph with non-negative weights, it's impossible to find a shorter path tocurrent_nodevia other unvisited nodes.
Step Two: Relaxation Operation
- Iterate through all unvisited neighbor nodes
neighborofcurrent_node. - Calculate a potential new path length to reach
neighborviacurrent_node:new_distance = dist[current_node] + weight(current_node, neighbor). - Compare: If the calculated
new_distanceis less than the currently recordeddist[neighbor], it means we have found a shorter path.- Update
dist[neighbor]tonew_distance. - Add the new distance estimate
(new_distance, neighbor)to the priority queuepq. Note that an old distance record forneighbormight already exist in the queue; we don't need to delete it, just add the new (smaller) record. In Step One's check, the old record will be skipped.
- Update
Step Three: Loop
- Repeat Step One and Step Two until the priority queue
pqis empty, or we have visited all nodes we need to access (sometimes we only care about the shortest path to a specific target node and can terminate early after finding it).
4. A Detailed Example
Consider the following graph. We take node A as the source and find the shortest paths from it to other nodes. The numbers on the edges represent distances.
(B)
1/ \4
(A) (D)
5\ /1
(C)
- Weight from A to B is 1, A to C is 5, B to D is 4, C to D is 1.
Initialization:
dist: A=0, B=∞, C=∞, D=∞
visited: {}
pq: [(0, A)]
First Loop Iteration:
- Pop the node with the smallest distance from
pq(A, 0). A is not invisited, add it tovisited.visited= {A}. - Relax neighbors of A: B and C.
- For B:
new_dist = 0 + 1 = 1. 1 < ∞, updatedist[B] = 1, add (B, 1) topq. - For C:
new_dist = 0 + 5 = 5. 5 < ∞, updatedist[C] = 5, add (C, 5) topq.
- For B:
- Now
dist: A=0, B=1, C=5, D=∞.pq: [(1, B), (5, C)]
Second Loop Iteration:
- Pop the node with the smallest distance from
pq(B, 1). B is not invisited, add it tovisited.visited= {A, B}. - Relax neighbors of B: D.
- For D:
new_dist = 1 + 4 = 5. 5 < ∞, updatedist[D] = 5, add (D, 5) topq.
- For D:
- Now
dist: A=0, B=1, C=5, D=5.pq: [(5, C), (5, D)]
Third Loop Iteration:
- Pop the node with the smallest distance from
pq(C, 5). C is not invisited, add it tovisited.visited= {A, B, C}. - Relax neighbors of C: D.
- For D:
new_dist = 5 + 1 = 6. 6 > currentdist[D]=5, so do not update.
- For D:
distremains unchanged.pq: [(5, D)]
Fourth Loop Iteration:
- Pop the node with the smallest distance from
pq(D, 5). D is not invisited, add it tovisited.visited= {A, B, C, D}. - D has no unvisited neighbors (all nodes are visited), no need to relax.
- Now
pqis empty.
Algorithm ends. Final shortest distances are: A to A:0, A to B:1, A to C:5, A to D:5.
The paths A->B->D and A->C->D both have distance 5, but the algorithm found the first one.
5. Time Complexity Analysis
- Each node is added to and popped from the priority queue once, operations count O(V).
- Each edge is traversed once for relaxation, count O(E).
- Insertion and deletion operations for the priority queue (min-heap) have time complexity O(log V).
- Therefore, the total time complexity of Dijkstra's algorithm using a priority queue is O((V + E) log V). If the graph is connected, E is at least V-1, which can be simplified to O(E log V).
6. Key Points and Notes
- Non-negative Weight Constraint: This is the cornerstone of the correctness of Dijkstra's algorithm. If negative-weight edges exist, the greedy strategy of selecting the "current closest node" may fail because a shorter path might be found later via a negative-weight edge. For graphs containing negative-weight edges, the Bellman-Ford algorithm is needed.
- Path Reconstruction: The above process only calculates the shortest distances. To record the specific path, maintain a
prev(orparent) array. Each time a successful relaxation updatesdist, recordprev[neighbor] = current_node. Finally, trace back from the target node through theprevarray to obtain the path. - Priority Queue Optimization: Using a Fibonacci heap can achieve a better theoretical time complexity of O(E + V log V). However, in practical applications, binary heaps (i.e., the usual priority queues) are widely used due to their smaller constant factors.