Showing and ordering how alike strings are is something I have a use for nearly every week, from “name rationalisation” jobs in address books, through admin helper utilities to “did you mean?” on web sites, but I did not realise the best way has a proper name, its called Levensnshtein distance and is defined as:
“The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other ”
It is already built into apache commons String utils as “getLevenshteinDistance”
but here is a near complete list of implemlimentions for different languages, I include the Java one from that page below (in case the source page goes down)
public class LevenshteinDistance { private static int minimum(int a, int b, int c) { return Math.min(Math.min(a, b), c); } public static int computeLevenshteinDistance(CharSequence str1, CharSequence str2) { int[][] distance = new int[str1.length() + 1][str2.length() + 1]; for (int i = 0; i <= str1.length(); i++) distance[i][0] = i; for (int j = 0; j <= str2.length(); j++) distance[0][j] = j; for (int i = 1; i <= str1.length(); i++) for (int j = 1; j <= str2.length(); j++) distance[i][j] = minimum( distance[i - 1][j] + 1, distance[i][j - 1] + 1, distance[i - 1][j - 1] + ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1)); return distance[str1.length()][str2.length()]; } }
Yes its sad, so just go away and leave me to my string comparisons 😉