Edit Distance Problem
Problem Description
Edit Distance (also known as Levenshtein Distance) refers to the minimum number of single-character edit operations required to transform one string into another. Allowed edit operations include: inserting a character, deleting a character, and replacing a character. For example, the edit distance from "horse" to "ros" is 3 (delete 'h' -> "orse", delete 'o' -> "rse", delete 'e' -> "rs", but the actual optimal solution is: horse -> rorse (replace 'h' with 'r') -> rose (replace 's' with 'o') -> ros (delete 'e')).
Solution Approach
This problem is typically solved using dynamic programming. We define a two-dimensional array dp[i][j], representing the minimum number of edit operations required to convert the first i characters of string word1 into the first j characters of string word2.
Detailed Steps
-
State Definition
Letdp[i][j]denote the minimum edit distance to convertword1[0..i-1](the first i characters) toword2[0..j-1](the first j characters). -
Initialize Boundary Conditions
- If the first
icharacters ofword1are to become the first 0 characters (empty string) ofword2,ideletions are needed (i.e.,dp[i][0] = i). - If the empty string of
word1is to become the firstjcharacters ofword2,jinsertions are needed (i.e.,dp[0][j] = j).
- If the first
-
State Transition Equation
For eachiandj(starting from 1), considerword1[i-1]andword2[j-1]:- If
word1[i-1] == word2[j-1], no operation is needed,dp[i][j] = dp[i-1][j-1]. - If the characters are not equal, consider three operations and take the minimum value plus 1:
- Insert: Insert
word2[j-1]after the first i characters ofword1to make them match, equivalent toword2's j-1 already being matched, sodp[i][j] = dp[i][j-1] + 1. - Delete: Delete
word1[i-1], equivalent to matching the first i-1 characters ofword1with the first j characters ofword2, sodp[i][j] = dp[i-1][j] + 1. - Replace: Replace
word1[i-1]withword2[j-1], equivalent to matching the first i-1 ofword1with the first j-1 ofword2and then performing one replacement, sodp[i][j] = dp[i-1][j-1] + 1.
Therefore, the state transition equation is:
- Insert: Insert
if word1[i-1] == word2[j-1]: dp[i][j] = dp[i-1][j-1] else: dp[i][j] = min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) + 1 - If
-
Table Filling Order
Sincedp[i][j]depends on the left, top, and top-left cells, fill the dp table row by row from left to right and top to bottom. -
Result Extraction
The final result is atdp[len(word1)][len(word2)].
Example
Take word1 = "horse", word2 = "ros" as an example:
- Initialize the dp table (6 rows, 4 columns, as the string lengths are 5 and 3 respectively):
dp[0][0] = 0, dp[1][0] = 1, ..., dp[5][0] = 5 dp[0][1] = 1, dp[0][2] = 2, dp[0][3] = 3 - Calculate
dp[1][1]: Compare 'h' and 'r', not equal, take min(dp[0][1], dp[1][0], dp[0][0])+1 = min(1,1,0)+1=1. - Finally,
dp[5][3]=3, meaning the edit distance is 3.
Complexity Analysis
- Time Complexity: O(m*n), where m and n are the lengths of the two strings respectively.
- Space Complexity: O(m*n), which can be optimized to O(min(m,n)).