how alike are 2 strings

Showing and ordering how alike strings are is something I have a use for nearly every week, from “name rationalisation” jobs in address books, through admin helper utilities to “did you mean?” on web sites, but I did not realise the best way has a proper name, its called Levensnshtein distance and is defined as:

“The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other ”

It is already built into apache commons String utils as “getLevenshteinDistance

but here is a near complete list of implemlimentions for different languages, I include the Java one from that page below (in case the source page goes down)

public class LevenshteinDistance {
        private static int minimum(int a, int b, int c) {
                return Math.min(Math.min(a, b), c);
        }
        public static int computeLevenshteinDistance(CharSequence str1,
                        CharSequence str2) {
                int[][] distance = new int[str1.length() + 1][str2.length() + 1];
                for (int i = 0; i <= str1.length(); i++)
                        distance[i][0] = i;
                for (int j = 0; j <= str2.length(); j++)
                        distance[0][j] = j;
                for (int i = 1; i <= str1.length(); i++)
                        for (int j = 1; j <= str2.length(); j++)
                                distance[i][j] = minimum(
                                                distance[i - 1][j] + 1,
                                                distance[i][j - 1] + 1,
                                                distance[i - 1][j - 1]
                                                                + ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0
                                                                                : 1));
                return distance[str1.length()][str2.length()];
        }
}

Yes its sad, so just go away and leave me to my string comparisons 😉

Leave a Reply

Your email address will not be published. Required fields are marked *