Longest common sub sequence problem is to find the longest
common sub sequence(LCS) common to all sub sequences given. Here
we will discuss how to find the LCS of two sequences (we
will use simple string for our convenience). This can be easily extended to more than
two sequences.
The basic principle is simple.
Suppose we have sequences S1 and S2 and,
1. If their last two characters are the same, then we are
100% sure that the last two characters will be always part
of the LCS. So, the LCS will be = (LCS of the sub sequences
excluding the last two characters) + (last characters which
is same for both the sequence).
2. If the last two characters are not matching, then both of
them cannot be in the LCS of the two sequences. But it is
possible that one of them may be in the LCS. Whichever
character results in maximum length for the common sequence
is taken as then only we will get the LCS. Suppose we are
examining ith and jth characters of the two sequences, then
if LCS of S1[1..j] and S2[1..i - 1] is longer than that of
LCS of S1[1..j-1] and S2[1..i], then S1[j] is added to the
sub sequence.
Let LCS(i,j) denotes the value of LCS length at position i
and j in S1 and S2 respectively. Dynamic programming comes
handy here. We initialize a 2d array of size n1 x n2 to
zero all. At each point i and j in S1 and S2 respectively,
LCS(i,j) = (LCS(i-1,j-1) added with S1[i]) if S1[i] == S2[i],
else,
LCS(i,j) = max(LCS(i-1,j), LCS(i,j-1))
the array element lcs[i][j] will hold the value for LCS(i,j);
lcs[n1-1, n2 -1] will ultimately hold maximum LCS length.
below is the implementation of a C routine to compute LCS
for two strings str1 and str2 of lengths length1 and length2
respectively.
common sub sequence(LCS) common to all sub sequences given. Here
we will discuss how to find the LCS of two sequences (we
will use simple string for our convenience). This can be easily extended to more than
two sequences.
The basic principle is simple.
Suppose we have sequences S1 and S2 and,
1. If their last two characters are the same, then we are
100% sure that the last two characters will be always part
of the LCS. So, the LCS will be = (LCS of the sub sequences
excluding the last two characters) + (last characters which
is same for both the sequence).
2. If the last two characters are not matching, then both of
them cannot be in the LCS of the two sequences. But it is
possible that one of them may be in the LCS. Whichever
character results in maximum length for the common sequence
is taken as then only we will get the LCS. Suppose we are
examining ith and jth characters of the two sequences, then
if LCS of S1[1..j] and S2[1..i - 1] is longer than that of
LCS of S1[1..j-1] and S2[1..i], then S1[j] is added to the
sub sequence.
Let LCS(i,j) denotes the value of LCS length at position i
and j in S1 and S2 respectively. Dynamic programming comes
handy here. We initialize a 2d array of size n1 x n2 to
zero all. At each point i and j in S1 and S2 respectively,
LCS(i,j) = (LCS(i-1,j-1) added with S1[i]) if S1[i] == S2[i],
else,
LCS(i,j) = max(LCS(i-1,j), LCS(i,j-1))
the array element lcs[i][j] will hold the value for LCS(i,j);
lcs[n1-1, n2 -1] will ultimately hold maximum LCS length.
below is the implementation of a C routine to compute LCS
for two strings str1 and str2 of lengths length1 and length2
respectively.
int compute_lcs(
char *str1, unsigned int length1,
char *str2, unsigned int length2
)
{
/// No checking for validation of input params, hope if
// you use this code, you will add them
// allocate space for a matrix of size length1 x length2
// and intialize them to zero
int i, j, max_lcs;
int **lcs = (int **) malloc(sizeof(int *) * length1 );
for (i = 0; i < length1; i++) {
lcs[i] = calloc(1, sizeof(int) * length2);
}
for (i = 0; i < length1; i++) {
for (j = 0; j < length2; j++) {
if (str1[i] == str2[j]) {
if (i == 0 || j == 0)
lcs[i][j] = 1;
else
lcs[i][j] = lcs[i-1][j-1] + 1;
} else {
if (i) {
if (j) {
lcs[i][j] = lcs[i-1][j];
if (lcs[i][j-1] > lcs[i][j]) {
lcs[i][j] = lcs[i][j-1];
}
} else {
lcs[i][j] = lcs[i-1][j];
}
}
if (j) {
if (i) {
lcs[i][j] = lcs[i-1][j];
if (lcs[i][j-1] > lcs[i][j]) {
lcs[i][j] = lcs[i][j-1];
}
} else {
lcs[i][j] = lcs[i][j-1];
}
}
}
}
}
max_lcs = lcs[length1 -1][length2 - 1];
for (i = 0; i < length1; i++) {
free(lcs[i]);
}
free(lcs);
return max_lcs;
}