A* Search Algorithm

A* Search Algorithm

A* search algorithm is an efficient algorithm widely used in path planning and graph traversal. It combines the completeness of Dijkstra's algorithm (guaranteeing the shortest path) and the heuristics of the Greedy Best-First Search algorithm (tending to explore towards the goal), guiding the search direction through an evaluation function, thereby being faster than both in most cases.

1. Core Idea: Evaluation Function f(n)

The core of the A* algorithm lies in its calculation of an evaluation function for each node n in the graph:
f(n) = g(n) + h(n)

  • g(n): This is the actual cost from the start node to the current node n. This value is known and accurate, similar to the distance recorded in Dijkstra's algorithm.
  • h(n): This is the estimated cost from the current node n to the goal node, also known as the heuristic function. This value is an estimate, not a precise value. The quality of h(n)'s design directly determines the efficiency of the A* algorithm.

The algorithm's goal is to find a path that minimizes the f(n) value of the path's endpoint (goal node).

2. Detailed Algorithm Process

The algorithm needs to maintain two sets:

  • Open List: A priority queue (usually a min-heap) containing all discovered but unevaluated nodes. Nodes are sorted by their f(n) value, with the node having the smallest f(n) having the highest priority.
  • Closed List: A set (e.g., a hash table) containing all nodes that have been evaluated. The optimal paths for these nodes have been determined and do not need to be processed again.

Steps are as follows:

  1. Initialization:

    • Add the start node S to the open list. Set g(S) = 0, calculate its heuristic value h(S), and its evaluation value f(S) = g(S) + h(S).
    • Initialize the closed list as empty.
  2. Loop Start: While the open list is not empty, repeat the following steps:
    a. Select Current Node: Remove the node with the smallest f(n) value from the open list, call it the current node N. If using a priority queue for the open list, this is a dequeue operation.
    b. Goal Test: If the current node N is the goal node T, then the search is successful. We can reconstruct the path by backtracking from T through parent nodes.
    c. Expand Node: Move node N from the open list to the closed list, marking it as evaluated. Then, iterate through all neighboring nodes M of N.
    d. Process Neighbor Node M: For each neighbor M:
    * Skip Closed Nodes: If M is in the closed list, ignore it directly because its shortest path is already determined.
    * Calculate Temporary g-value: Calculate the cost from the start to M via N: temp_g = g(N) + cost(N, M), where cost(N, M) is the weight of the edge from N to M.
    * Determine if a New or Better Path is Found:
    * If M is not in the open list, it is a newly discovered node. Add M to the open list, set its g(M) = temp_g, calculate f(M) = g(M) + h(M), and record N as the parent of M.
    * If M is already in the open list, we have found a new path to M. Compare the cost of the new path temp_g with the previously recorded g(M):
    - If temp_g < g(M), the new path is better. Update M's g(M) to temp_g, and update f(M) accordingly. Also, set M's parent node to N. Since f(M) has changed, adjust M's position in the open list (priority queue).
    - If temp_g >= g(M), the new path is worse or equal, so do nothing.

  3. Loop End: If the open list becomes empty, it means all reachable nodes have been explored but the goal node has not been found. The search fails; no path exists from the start to the goal.

3. Key Properties of the Heuristic Function h(n)

The choice of the heuristic function h(n) is crucial. It must satisfy the following conditions to ensure the correctness and efficiency of the A* algorithm:

  • Admissibility: h(n) must never overestimate the actual minimum cost from the current node n to the goal node. That is, h(n) <= h*(n), where h*(n) is the true actual cost. This property guarantees that the path found by A* is optimal (shortest).

    • Example: In a grid map, the Manhattan distance (only up, down, left, right moves) or straight-line distance (Euclidean distance) are admissible, as they are optimistic estimates of the shortest path between two points.
  • Consistency (or Monotonicity): For any node n and its neighbor m, it must satisfy h(n) <= cost(n, m) + h(m). This can be understood as the heuristic function will not "cheat" you into thinking the cost from n to the goal is greater than the cost of going to a neighbor m and then to the goal. Consistency is a stronger form of admissibility. If h(n) is consistent, then when the A* algorithm expands a node, the path to that node is guaranteed to be optimal, its g(n) value never needs to be updated again, and the algorithm is more efficient.

4. A Simple Example

Suppose there is a 4x4 grid, with the start at (0,0) and the goal at (3,3). We can use the Manhattan distance as the heuristic function: h(n) = |x_n - x_goal| + |y_n - y_goal|.

  1. Start (0,0): g=0, h=6, f=6. Add to open list.
  2. Remove (0,0) with smallest f for expansion. Check its neighbors (0,1) and (1,0).
    • (0,1): g=1, h=5, f=6. Add to open list.
    • (1,0): g=1, h=5, f=6. Add to open list.
  3. The open list now contains (0,1) and (1,0), both with f=6. Suppose we remove (0,1) for expansion. Check its neighbors (0,2), (1,1), (0,0).
    • (0,2): g=2, h=4, f=6. Add to open list.
    • (1,1): g=2, h=4, f=6. Add to open list.
    • (0,0) is in the closed list, ignore.
  4. The algorithm continues, always prioritizing the expansion of nodes that appear closer to the goal (smaller f value), until it finally finds the goal point (3,3).

Summary
The A* search algorithm intelligently guides the search direction by combining the known cost g(n) and the estimated cost h(n). While guaranteeing an optimal path, it significantly reduces the number of nodes that need to be explored, making it a very powerful and practical tool for solving shortest path problems.