Edit Distance Problem

Edit Distance Problem

Problem Description
Edit Distance (also known as Levenshtein Distance) refers to the minimum number of single-character edit operations required to transform one string into another. Allowed edit operations include: inserting a character, deleting a character, and replacing a character. For example, the edit distance from "horse" to "ros" is 3 (delete 'h' -> "orse", delete 'o' -> "rse", delete 'e' -> "rs", but the actual optimal solution is: horse -> rorse (replace 'h' with 'r') -> rose (replace 's' with 'o') -> ros (delete 'e')).

Solution Approach
This problem is typically solved using dynamic programming. We define a two-dimensional array dp[i][j], representing the minimum number of edit operations required to convert the first i characters of string word1 into the first j characters of string word2.

Detailed Steps

  1. State Definition
    Let dp[i][j] denote the minimum edit distance to convert word1[0..i-1] (the first i characters) to word2[0..j-1] (the first j characters).

  2. Initialize Boundary Conditions

    • If the first i characters of word1 are to become the first 0 characters (empty string) of word2, i deletions are needed (i.e., dp[i][0] = i).
    • If the empty string of word1 is to become the first j characters of word2, j insertions are needed (i.e., dp[0][j] = j).
  3. State Transition Equation
    For each i and j (starting from 1), consider word1[i-1] and word2[j-1]:

    • If word1[i-1] == word2[j-1], no operation is needed, dp[i][j] = dp[i-1][j-1].
    • If the characters are not equal, consider three operations and take the minimum value plus 1:
      • Insert: Insert word2[j-1] after the first i characters of word1 to make them match, equivalent to word2's j-1 already being matched, so dp[i][j] = dp[i][j-1] + 1.
      • Delete: Delete word1[i-1], equivalent to matching the first i-1 characters of word1 with the first j characters of word2, so dp[i][j] = dp[i-1][j] + 1.
      • Replace: Replace word1[i-1] with word2[j-1], equivalent to matching the first i-1 of word1 with the first j-1 of word2 and then performing one replacement, so dp[i][j] = dp[i-1][j-1] + 1.
        Therefore, the state transition equation is:
    if word1[i-1] == word2[j-1]:
        dp[i][j] = dp[i-1][j-1]
    else:
        dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1
    
  4. Table Filling Order
    Since dp[i][j] depends on the left, top, and top-left cells, fill the dp table row by row from left to right and top to bottom.

  5. Result Extraction
    The final result is at dp[len(word1)][len(word2)].

Example
Take word1 = "horse", word2 = "ros" as an example:

  • Initialize the dp table (6 rows, 4 columns, as the string lengths are 5 and 3 respectively):
    dp[0][0] = 0, dp[1][0] = 1, ..., dp[5][0] = 5  
    dp[0][1] = 1, dp[0][2] = 2, dp[0][3] = 3
    
  • Calculate dp[1][1]: Compare 'h' and 'r', not equal, take min(dp[0][1], dp[1][0], dp[0][0])+1 = min(1,1,0)+1=1.
  • Finally, dp[5][3]=3, meaning the edit distance is 3.

Complexity Analysis

  • Time Complexity: O(m*n), where m and n are the lengths of the two strings respectively.
  • Space Complexity: O(m*n), which can be optimized to O(min(m,n)).