Browse Source

Adding longest common subsequence program Python

Charles Reid 4 years ago
  1. 133


@ -0,0 +1,133 @@
from pprint import pprint
Dynamic programming:
Compute the longest common subsequence of two strings.
The pseudocode for computing the length of the LCS is straightforward:
function LCSLength(X[1..m], Y[1..n])
C = array(0..m, 0..n)
for i := 0..m
C[i,0] = 0
for j := 0..n
C[0,j] = 0
for i := 1..m
for j := 1..n
if X[i] = Y[j]
C[i,j] := C[i-1,j-1] + 1
C[i,j] := max(C[i,j-1], C[i-1,j])
return C[m,n]
The pseudocode for computing the LCS itself is implemented recursively:
function backtrack(C[0..m,0..n], X[1..m], Y[1..n], i, j)
if i = 0 or j = 0
return ""
else if X[i] = Y[j]
return backtrack(C, X, Y, i-1, j-1) + X[i]
if C[i,j-1] > C[i-1,j]
return backtrack(C, X, Y, i, j-1)
return backtrack(C, X, Y, i-1, j)
def lcs_matrix(string1, string2):
Computes the longest common subsequence between string1 and string2,
using a dynamic programming matrix. This returns the matrix.
The matrix can then be sued to get the length of the longest subsequence,
or get the longest subsequence itself.
m = len(string1)+1
n = len(string2)+1
# Allocate a list that is size len(string1+1) x len(string2+1)
# (extra +1 accounts for boundaries).
# To allocate a list of a certain size in Python, use list comprehensions.
# Be careful about order!
data = [[0 for j in range(n)] for i in range(m)]
# Now we can index using data[i][j]
for i in range(m):
data[i][0] = 0
for j in range(n):
data[0][j] = 0
# Now we actually do the dynamic programming:
# Split the problem into parts, and create a lookup table.
for i in range(1,m):
for j in range(1,n):
if string1[i-1] == string2[j-1]:
data[i][j] = data[i-1][j-1] + 1
data[i][j] = max(data[i][j-1], data[i-1][j])
# This returns the length of the LCS
return data
def lcs_length(string1,string2):
Returns the length of the longest common subsequence.
ii = len(string1)
jj = len(string2)
data = lcs_matrix(string1, string2)
return data[ii][jj]
def lcs_sequence(string1, string2):
Returns the longest common subsequence.
data = lcs_matrix(string1, string2)
ii = len(string1)
jj = len(string2)
return backtrack(data, string1, string2, ii, jj)
def backtrack(data, string1, string2, i, j):
if i is 0 or j is 0:
# Nothing in common
return ""
elif string1[i-1] is string2[j-1]:
# If last characters in prefixes are equal, they must be in an LCS
return backtrack(data, string1, string2, i-1, j-1) + string1[i-1]
# Determine how to advance forward (pick which one will give longest subsequence)
if data[i][j-1] > data[i-1][j]:
return backtrack(data, string1, string2, i, j-1)
return backtrack(data, string1, string2, i-1, j)
if __name__=="__main__":
string1 = "turbomachinery infrastructure plan"
string2 = "rotor machine implant diagnostics"
#string1 = "aacccccc"
#string2 = "cccccccc"
#string1 = "ccccaaaaacccc"
#string2 = "ccccccccccccc"
print("Longest common subsequence: {0:s}".format(lcs_sequence(string1,string2)))
print("Length of longest common subsequence: {0:d}".format(lcs_length(string1,string2)))
## These should be the same: