Longest Common Subsequence (LCS) Problem
Longest Common Subsequence (LCS) Problem
Problem Description
The Longest Common Subsequence (LCS) problem involves finding the length of the longest common subsequence between two sequences (e.g., strings). A subsequence does not need to be contiguous but must preserve the original order. For example, the LCS of strings "abcde" and "ace" is "ace", with a length of 3. This problem is commonly applied in scenarios like version control and biological gene sequence alignment.
Solution Approach
- Brute Force is Infeasible: If the lengths of the two sequences are m and n, the number of possible subsequences are 2^m and 2^n respectively. Comparing all combinations would have a time complexity of O(2^(m+n)), which is highly inefficient.
- Core Idea of Dynamic Programming: Decompose the problem into subproblems, using a 2D array
dp[i][j]to record the LCS length of the first i characters of sequence X and the first j characters of sequence Y. By filling the dp table, the final result is gradually derived. - State Transition Equation:
- If X[i-1] == Y[j-1] (current characters match):
dp[i][j] = dp[i-1][j-1] + 1 - If characters do not match:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
- If X[i-1] == Y[j-1] (current characters match):
- Boundary Conditions: When i=0 or j=0, the LCS length between an empty sequence and any sequence is 0.
Step-by-Step Derivation Example
Take X="abcde", Y="ace" as an example:
- Initialize the dp table (m+1 rows x n+1 columns), setting the first row and first column to 0:
Ø a c e
Ø 0 0 0 0
a 0
b 0
c 0
d 0
e 0
- Fill the dp table row by row:
- i=1, j=1: X[0]='a'=Y[0] → dp[1][1]=dp[0][0]+1=1
- i=1, j=2: 'a'≠'c' → dp[1][2]=max(dp[0][2], dp[1][1])=max(0,1)=1
- Continue similarly. The final table is as follows:
Ø a c e
Ø 0 0 0 0
a 0 1 1 1
b 0 1 1 1
c 0 1 2 2
d 0 1 2 2
e 0 1 2 3
- Read the Result: dp[m][n]=dp[5][3]=3 is the LCS length.
Recovering the LCS Sequence
Backtrack from dp[m][n]:
- If X[i-1]=Y[j-1], this character belongs to the LCS, move diagonally up-left (i--, j--).
- If dp[i][j] comes from above (dp[i-1][j]), the current character is not part of the LCS, move up (i--).
- If it comes from the left (dp[i][j-1]), move left (j--).
Example backtracking path (X="abcde", Y="ace"):
- (5,3): 'e'='e' → take 'e', move to (4,2)
- (4,2): 'd'≠'c', compare dp[3][2]=2 and dp[4][1]=1 → move up to (3,2)
- (3,2): 'c'='c' → take 'c', move to (2,1)
- (2,1): 'b'≠'a', move up to (1,1)
- (1,1): 'a'='a' → take 'a'
The LCS in reverse order is "eca", which becomes "ace" after reversal.
Complexity Analysis
- Time Complexity: O(mn), required to fill an m x n dp table.
- Space Complexity: O(mn), can be optimized to O(min(m,n)) (only keep the previous row and current row).
Common Variants
- Output all LCSs (not just one).
- Restrict subsequence continuity (transforms into the Longest Common Substring problem).
- LCS for multiple sequences (dimensional expansion, increased complexity).