Shortest Path Algorithms in Graphs

Shortest Path Algorithms in Graphs

The shortest path problem in graphs is a classic problem in graph theory, aiming to find the path with the minimum sum of edge weights between two nodes in a graph. Among them, Dijkstra's algorithm is the most famous algorithm for solving the single-source shortest path problem in graphs with non-negative weights.

1. Problem Description and Core Idea

Assume we have a weighted directed graph (or undirected graph), where each edge has a non-negative weight (which can be understood as distance, cost, or time). Given a source node (starting point), the goal of Dijkstra's algorithm is to find the shortest path and its length from the source node to all other nodes in the graph.

Core Idea: It employs a greedy strategy. It maintains a set containing nodes for which the shortest path has already been found. The algorithm gradually expands this set. Each time, it selects the node with the smallest known distance from the source among the nodes whose shortest path hasn't been finalized, adds it to the set, and updates the distance estimates for its neighboring nodes.

2. Data Structures Required for the Algorithm

To implement the algorithm efficiently, we need the following data structures:

dist: An array (or dictionary) used to record the current known shortest distance from the source node to each node. Initially, the source node's distance is set to 0, and all other nodes' distances are set to infinity (∞).
visited (or finalized_set): A set (or Boolean array) used to mark which nodes' shortest distances have been finally determined.
priority_queue: A priority queue (min-heap) used to efficiently select the currently closest node to the source that has not been visited. Elements in the queue are typically (distance, node) pairs.

3. Decomposition of Algorithm Steps

Let's break down the workflow of Dijkstra's algorithm step by step:

Step Zero: Initialization

Create the dist dictionary/array. Set dist[source] = 0, and for all other nodes v in the graph, set dist[v] = ∞.
Create the visited set, initially empty.
Create the priority queue pq and add the source node (0, source) to the queue.

Step One: Select the Current Closest Node

Pop the node with the smallest current distance from the priority queue pq, denoted as current_node (the first pop will definitely be the source node).
Check: If current_node is already in the visited set, skip it (this means a shorter distance for it has already been processed, and the current pop is an old, invalid record). Otherwise, proceed to the next step.
Mark current_node as visited (add it to the visited set). At this point, the value of dist[current_node] is the final shortest distance from the source node to current_node. This choice is greedy and correct because, in a graph with non-negative weights, it's impossible to find a shorter path to current_node via other unvisited nodes.

Step Two: Relaxation Operation

Iterate through all unvisited neighbor nodes neighbor of current_node.
Calculate a potential new path length to reach neighbor via current_node: new_distance = dist[current_node] + weight(current_node, neighbor).
Compare: If the calculated new_distance is less than the currently recorded dist[neighbor], it means we have found a shorter path.
- Update dist[neighbor] to new_distance.
- Add the new distance estimate (new_distance, neighbor) to the priority queue pq. Note that an old distance record for neighbor might already exist in the queue; we don't need to delete it, just add the new (smaller) record. In Step One's check, the old record will be skipped.

Step Three: Loop

Repeat Step One and Step Two until the priority queue pq is empty, or we have visited all nodes we need to access (sometimes we only care about the shortest path to a specific target node and can terminate early after finding it).

4. A Detailed Example

Consider the following graph. We take node A as the source and find the shortest paths from it to other nodes. The numbers on the edges represent distances.

    (B)
  1/   \4
 (A)   (D)
  5\   /1
    (C)

Weight from A to B is 1, A to C is 5, B to D is 4, C to D is 1.

Initialization:
dist: A=0, B=∞, C=∞, D=∞
visited: {}
pq: [(0, A)]

First Loop Iteration:

Pop the node with the smallest distance from pq (A, 0). A is not in visited, add it to visited. visited = {A}.
Relax neighbors of A: B and C.
- For B: new_dist = 0 + 1 = 1. 1 < ∞, update dist[B] = 1, add (B, 1) to pq.
- For C: new_dist = 0 + 5 = 5. 5 < ∞, update dist[C] = 5, add (C, 5) to pq.
Now dist: A=0, B=1, C=5, D=∞. pq: [(1, B), (5, C)]

Second Loop Iteration:

Pop the node with the smallest distance from pq (B, 1). B is not in visited, add it to visited. visited = {A, B}.
Relax neighbors of B: D.
- For D: new_dist = 1 + 4 = 5. 5 < ∞, update dist[D] = 5, add (D, 5) to pq.
Now dist: A=0, B=1, C=5, D=5. pq: [(5, C), (5, D)]

Third Loop Iteration:

Pop the node with the smallest distance from pq (C, 5). C is not in visited, add it to visited. visited = {A, B, C}.
Relax neighbors of C: D.
- For D: new_dist = 5 + 1 = 6. 6 > current dist[D]=5, so do not update.
dist remains unchanged. pq: [(5, D)]

Fourth Loop Iteration:

Pop the node with the smallest distance from pq (D, 5). D is not in visited, add it to visited. visited = {A, B, C, D}.
D has no unvisited neighbors (all nodes are visited), no need to relax.
Now pq is empty.

Algorithm ends. Final shortest distances are: A to A:0, A to B:1, A to C:5, A to D:5.
The paths A->B->D and A->C->D both have distance 5, but the algorithm found the first one.

5. Time Complexity Analysis

Each node is added to and popped from the priority queue once, operations count O(V).
Each edge is traversed once for relaxation, count O(E).
Insertion and deletion operations for the priority queue (min-heap) have time complexity O(log V).
Therefore, the total time complexity of Dijkstra's algorithm using a priority queue is O((V + E) log V). If the graph is connected, E is at least V-1, which can be simplified to O(E log V).

6. Key Points and Notes

Non-negative Weight Constraint: This is the cornerstone of the correctness of Dijkstra's algorithm. If negative-weight edges exist, the greedy strategy of selecting the "current closest node" may fail because a shorter path might be found later via a negative-weight edge. For graphs containing negative-weight edges, the Bellman-Ford algorithm is needed.
Path Reconstruction: The above process only calculates the shortest distances. To record the specific path, maintain a prev (or parent) array. Each time a successful relaxation updates dist, record prev[neighbor] = current_node. Finally, trace back from the target node through the prev array to obtain the path.
Priority Queue Optimization: Using a Fibonacci heap can achieve a better theoretical time complexity of O(E + V log V). However, in practical applications, binary heaps (i.e., the usual priority queues) are widely used due to their smaller constant factors.