To convert string "ab" to "ax" is also, the edit distance is 1.
The edit operation will include
1. insert of chars
2. delete of chars
3. substitution of chars
The algorithm is well known Wagner-Fisher algorithm.
While converting "a" to "ax" , we do insert of "x" at the end of the source string. So, edit distance is 1.
While converting "ab" to "ax", we substitute "b" with "x" and hence edit distance here is 1.
While converting "aqbc" to "abcx", we 1. delete char "q" and 2. add char "x" to the end. Hence edit distance here is 2.
Now how do we calculate the edit distance?
Suppose we are considering only insertion,deletion and substitution of characters (no transposition).We use dynamic programming to calculate the edit distance.
Let us say, S is the array containing the source string and T is the array where we will put the target string.
Let us say, ns is numbers of characters in S and nt is the number of characters in T.
Let us number the positions of the array from 1 to ns and 1 to nt respectively.
Suppose we are examining T[i] and S[j] chars. We already "know" edit distance for substrings S[1..j] and T[1..i-1], and
for substrings S[1..j-1] and T[1..i-1], and for S[1..j-1] and T[1..i].
1. Suppose, S[j] same as T[i], then edit distance involving S[1..j] and T[1..i] will be same sas with S[1..j-1] and
T[1..i-1].
2. Suppose S[j] is not equal to T[i], then
e1 = edit distance of substrings involving S[1..j-1] and T[1..i]
e2 = edit distance of substrings involving S[1..j] and T[1..i-1]
e3 = edit distance of substrings involving S[1..j-1] and T[1..i-1]
Among e1,e2,e3:
if e1 is the smallest, then edit distance of substrings S[1..j] and T[1..i] = e1 + 1. Here it is equivalent of deletion
of char S[j] in the source string.
if e2 is the smallest, then edit distance of substrings S[1..j] and T[1..i] = e2 + 1. Here it is equivalent of
insertion of char T[i] in the source string to form the target string.
if e3 is the smallest, then edit distance of substrings S[1..j] and T[1..i] = e3 + 1. Here it is equivalent of
substituting S[j] with T[i] to form the target string.
Below, I have given the c code implementation of the above.
int calculate_Levenshteindistance_distance(
char *source_string,
int source_length,
char *target_string,
int target_length
)
{
// No checking, hope for your tests you will
// pass proper
// values
int i, j, e1,e2,e3;
int **matrix_values;
matrix_values = (int **)
calloc(1, sizeof(int *) * (target_length + 1));
for (i = 0; i <= target_length ; i++) {
matrix_values[i] = (int *)
calloc(1, sizeof(int) * (source_length + 1));
}
// Here is the key ... initialize the column 0 of
// the matrix with the maximum possible edit distances
// for target with length
// 0, 1,2,3,......target_length when the source string
// is of length 0.
// Similarly initialize the 0th row of the matrix.
for (i = 0; i <= target_length ; i++)
matrix_values[i][0] = i;
for (j = 0; j <= source_length ; j++)
matrix_values[0][j] = j;
for (i = 1; i <= target_length ; i++) {
for (j=1; j <= source_length; j++) {
if (source_string[j - 1] == target_string[i - 1]) {
matrix_values[i][j] = matrix_values[i-1][j-1];
continue;
}
e1 = matrix_values[i][j-1];
e2 = matrix_values[i-1][j];
e3 = matrix_values[i-1][j-1];
if (e1 > e2 )
e1 = e2;
if (e1 > e3 )
e1 = e3;
matrix_values[i][j] = e1 + 1;
}
}
return matrix_values[target_length][source_length];
}
No comments:
Post a Comment