History

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




Edit distance application in bioinformatics !: Revision

There are other popular measures of edit distance, which are calculated using a different set of allowable edit operations. For instance,

 

use Text::Levenshtein qw(distance);

 print distance("foo","four");
 # prints "2"

 my @words     = qw/ four foo bar /;
 my @distances = distance("foo",@words);

 print "@distances";
 # prints "2 0 3"


use Algorithm::LCSS qw( LCSS CSS CSS_Sorted );
    my $lcss_ary_ref = LCSS( \@SEQ1, \@SEQ2 );  # ref to array
    my $lcss_string  = LCSS( $STR1, $STR2 );    # string
    my $css_ary_ref = CSS( \@SEQ1, \@SEQ2 );    # ref to array of arrays
    my $css_str_ref = CSS( $STR1, $STR2 );      # ref to array of strings
    my $css_ary_ref = CSS_Sorted( \@SEQ1, \@SEQ2 );  # ref to array of arrays
    my $css_str_ref = CSS_Sorted( $STR1, $STR2 );    # ref to array of strings



There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.

Text::LevenshteinXS and Text::Levenshtein::XS are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.

The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module Text::Levenshtein::Damerau defaults to using a pure perl implementation, but if you've installed Text::Levenshtein::Damerau::XS then it will be a lot quicker.

Text::WagnerFischer is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.

Text::Brew is an implementation of the Brew edit distance, which is another algorithm based on edit weights.

Text::Fuzzy provides a number of operations for partial or fuzzy matching of text based on edit distance. Text::Fuzzy::PP is a pure perl implementation of the same interface.

String::Similarity takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.

Text::Dice calculates Dice's coefficient for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.