There are other popular measures of edit distance, which are calculated using a different set of allowable edit operations. For instance,
use Text::Levenshtein qw(distance); print distance("foo","four"); # prints "2" my @words = qw/ four foo bar /; my @distances = distance("foo",@words); print "@distances"; # prints "2 0 3"
use Algorithm::LCSS qw( LCSS CSS CSS_Sorted ); my $lcss_ary_ref = LCSS( \@SEQ1, \@SEQ2 ); # ref to array my $lcss_string = LCSS( $STR1, $STR2 ); # string my $css_ary_ref = CSS( \@SEQ1, \@SEQ2 ); # ref to array of arrays my $css_str_ref = CSS( $STR1, $STR2 ); # ref to array of strings my $css_ary_ref = CSS_Sorted( \@SEQ1, \@SEQ2 ); # ref to array of arrays my $css_str_ref = CSS_Sorted( $STR1, $STR2 ); # ref to array of strings
There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.
Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.
The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module Text::Levenshtein::Damerau defaults to using a pure perl implementation, but if you've installed Text::Levenshtein::Damerau::XS then it will be a lot quicker.
Text::WagnerFischer is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.
Text::Brew is an implementation of the Brew edit distance, which is another algorithm based on edit weights.
Text::Fuzzy provides a number of operations for partial or fuzzy matching of text based on edit distance. Text::Fuzzy::PP is a pure perl implementation of the same interface.
String::Similarity takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.
Text::Dice calculates Dice's coefficient for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.
Comments
This one is very nice tutorial
https://web.stanford.edu/class/cs124/lec/med.pdf
Nice PDF explaining the concept.
https://web.stanford.edu/class/cs124/lec/med.pdf