### Adding longest common subsequence program Python

master Charles Reid 4 years ago
parent
commit
0aa2ce9723
1 changed files with 133 additions and 0 deletions
1. 133
dp/longest_common_subsequence.py

#### 133 dp/longest_common_subsequence.py View File

 `@ -0,0 +1,133 @@` `from pprint import pprint` `"""` `Dynamic programming:` ``` ``` `Compute the longest common subsequence of two strings.` ``` ``` ``` ``` ``` ``` `The pseudocode for computing the length of the LCS is straightforward:` ``` ``` `function LCSLength(X[1..m], Y[1..n])` ` C = array(0..m, 0..n)` ` for i := 0..m` ` C[i,0] = 0` ` for j := 0..n` ` C[0,j] = 0` ` for i := 1..m` ` for j := 1..n` ` if X[i] = Y[j]` ` C[i,j] := C[i-1,j-1] + 1` ` else` ` C[i,j] := max(C[i,j-1], C[i-1,j])` ` return C[m,n]` ``` ``` ``` ``` `The pseudocode for computing the LCS itself is implemented recursively:` ``` ``` `function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)` ` if i = 0 or j = 0` ` return ""` ` else if X[i] = Y[j]` ` return backtrack(C, X, Y, i-1, j-1) + X[i]` ` else` ` if C[i,j-1] > C[i-1,j]` ` return backtrack(C, X, Y, i, j-1)` ` else` ` return backtrack(C, X, Y, i-1, j)` ``` ``` `"""` ``` ``` ``` ``` ``` ``` `def lcs_matrix(string1, string2):` ` """` ` Computes the longest common subsequence between string1 and string2,` ` using a dynamic programming matrix. This returns the matrix.` ` The matrix can then be sued to get the length of the longest subsequence,` ` or get the longest subsequence itself.` ` """` ` m = len(string1)+1` ` n = len(string2)+1` ``` ``` ` # Allocate a list that is size len(string1+1) x len(string2+1)` ` # (extra +1 accounts for boundaries).` ` # To allocate a list of a certain size in Python, use list comprehensions.` ` # Be careful about order!` ` data = [[0 for j in range(n)] for i in range(m)]` ``` ``` ` # Now we can index using data[i][j]` ` for i in range(m):` ` data[i] = 0` ``` ``` ` for j in range(n):` ` data[j] = 0` ``` ``` ` # Now we actually do the dynamic programming:` ` # Split the problem into parts, and create a lookup table.` ` for i in range(1,m):` ` for j in range(1,n):` ` if string1[i-1] == string2[j-1]:` ` data[i][j] = data[i-1][j-1] + 1` ` else:` ` data[i][j] = max(data[i][j-1], data[i-1][j])` ``` ``` ` # This returns the length of the LCS ` ` return data` ``` ``` ``` ``` ``` ``` `def lcs_length(string1,string2):` ` """` ` Returns the length of the longest common subsequence.` ` """` ` ii = len(string1)` ` jj = len(string2)` ` data = lcs_matrix(string1, string2)` ` return data[ii][jj]` ``` ``` ``` ``` ``` ``` `def lcs_sequence(string1, string2):` ` """` ` Returns the longest common subsequence.` ` """` ` data = lcs_matrix(string1, string2)` ` ii = len(string1)` ` jj = len(string2)` ` return backtrack(data, string1, string2, ii, jj)` ``` ``` ``` ``` `def backtrack(data, string1, string2, i, j):` ` if i is 0 or j is 0: ` ` # Nothing in common` ` return ""` ` elif string1[i-1] is string2[j-1]:` ` # If last characters in prefixes are equal, they must be in an LCS ` ` return backtrack(data, string1, string2, i-1, j-1) + string1[i-1]` ` else:` ` # Determine how to advance forward (pick which one will give longest subsequence)` ` if data[i][j-1] > data[i-1][j]:` ` return backtrack(data, string1, string2, i, j-1)` ` else:` ` return backtrack(data, string1, string2, i-1, j)` ``` ``` ``` ``` ``` ``` `if __name__=="__main__":` ` string1 = "turbomachinery infrastructure plan"` ` string2 = "rotor machine implant diagnostics"` ``` ``` ` #string1 = "aacccccc"` ` #string2 = "cccccccc"` ``` ``` ` #string1 = "ccccaaaaacccc"` ` #string2 = "ccccccccccccc"` ``` ``` ` print("Longest common subsequence: {0:s}".format(lcs_sequence(string1,string2)))` ` print("Length of longest common subsequence: {0:d}".format(lcs_length(string1,string2)))` ``` ``` ` ## These should be the same:` ` #print(lcs_length(string1,string2))` ` #print(len(lcs_sequence(string1,string2)))` ``` ```