There are other popular measures of edit distance, which are calculated using a different set of allowable edit operations. For instance,
use Text::Levenshtein qw(distance); print distance("foo","four"); # prints "2" my @words = qw/ four foo bar /; my @distances = distance("foo",@words); print "@distances"; # prints "2 0 3"
use Algorithm::LCSS qw( LCSS CSS CSS_Sorted ); my $lcss_ary_ref = LCSS( \@SEQ1, \@SEQ2 ); # ref to array my $lcss_string = LCSS( $STR1, $STR2 ); # string my $css_ary_ref = CSS( \@SEQ1, \@SEQ2 ); # ref to array of arrays my $css_str_ref = CSS( $STR1, $STR2 ); # ref to array of strings my $css_ary_ref = CSS_Sorted( \@SEQ1, \@SEQ2 ); # ref to array of arrays my $css_str_ref = CSS_Sorted( $STR1, $STR2 ); # ref to array of strings
There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.
Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.
The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module Text::Levenshtein::Damerau defaults to using a pure perl implementation, but if you've installed Text::Levenshtein::Damerau::XS then it will be a lot quicker.
Text::WagnerFischer is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.
Text::Brew is an implementation of the Brew edit distance, which is another algorithm based on edit weights.
Text::Fuzzy provides a number of operations for partial or fuzzy matching of text based on edit distance. Text::Fuzzy::PP is a pure perl implementation of the same interface.
String::Similarity takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.
Text::Dice calculates Dice's coefficient for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.