Edit Distance Problem

Edit Distance Problem

Problem Description
Edit Distance (also known as Levenshtein Distance) refers to the minimum number of single-character edit operations required to transform one string into another. Allowed edit operations include: inserting a character, deleting a character, and replacing a character. For example, the edit distance from "horse" to "ros" is 3 (delete 'h' -> "orse", delete 'o' -> "rse", delete 'e' -> "rs", but the actual optimal solution is: horse -> rorse (replace 'h' with 'r') -> rose (replace 's' with 'o') -> ros (delete 'e')).

Solution Approach
This problem is typically solved using dynamic programming. We define a two-dimensional array dp[i][j], representing the minimum number of edit operations required to convert the first i characters of string word1 into the first j characters of string word2.

Detailed Steps

State Definition
Let dp[i][j] denote the minimum edit distance to convert word1[0..i-1] (the first i characters) to word2[0..j-1] (the first j characters).
Initialize Boundary Conditions
- If the first i characters of word1 are to become the first 0 characters (empty string) of word2, i deletions are needed (i.e., dp[i][0] = i).
- If the empty string of word1 is to become the first j characters of word2, j insertions are needed (i.e., dp[0][j] = j).
State Transition Equation
For each i and j (starting from 1), consider word1[i-1] and word2[j-1]:
- If word1[i-1] == word2[j-1], no operation is needed, dp[i][j] = dp[i-1][j-1].
- If the characters are not equal, consider three operations and take the minimum value plus 1:
  - Insert: Insert word2[j-1] after the first i characters of word1 to make them match, equivalent to word2's j-1 already being matched, so dp[i][j] = dp[i][j-1] + 1.
  - Delete: Delete word1[i-1], equivalent to matching the first i-1 characters of word1 with the first j characters of word2, so dp[i][j] = dp[i-1][j] + 1.
  - Replace: Replace word1[i-1] with word2[j-1], equivalent to matching the first i-1 of word1 with the first j-1 of word2 and then performing one replacement, so dp[i][j] = dp[i-1][j-1] + 1.
    Therefore, the state transition equation is:
```
if word1[i-1] == word2[j-1]:
    dp[i][j] = dp[i-1][j-1]
else:
    dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1
```
Table Filling Order
Since dp[i][j] depends on the left, top, and top-left cells, fill the dp table row by row from left to right and top to bottom.
Result Extraction
The final result is at dp[len(word1)][len(word2)].

Example
Take word1 = "horse", word2 = "ros" as an example:

Initialize the dp table (6 rows, 4 columns, as the string lengths are 5 and 3 respectively):
```
dp[0][0] = 0, dp[1][0] = 1, ..., dp[5][0] = 5  
dp[0][1] = 1, dp[0][2] = 2, dp[0][3] = 3
```
Calculate dp[1][1]: Compare 'h' and 'r', not equal, take min(dp[0][1], dp[1][0], dp[0][0])+1 = min(1,1,0)+1=1.
Finally, dp[5][3]=3, meaning the edit distance is 3.

Complexity Analysis

Time Complexity: O(m*n), where m and n are the lengths of the two strings respectively.
Space Complexity: O(m*n), which can be optimized to O(min(m,n)).