<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Edit distance application in bioinformatics !]]></title>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics?</link>
	<atom:link href="https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics?" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</guid>
	<pubDate>Thu, 07 Dec 2017 08:46:51 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</link>
	<title><![CDATA[Edit distance application in bioinformatics !]]></title>
	<description><![CDATA[<p>There are other popular measures of&nbsp;<a href="https://en.wikipedia.org/wiki/Edit_distance" title="Edit distance">edit distance</a>, which are calculated using a different set of allowable edit operations. For instance,</p><ul>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance" title="Damerau&ndash;Levenshtein distance">Damerau&ndash;Levenshtein distance</a>&nbsp;allows insertion, deletion, substitution, and the&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>&nbsp;of two adjacent characters;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Longest_common_subsequence_problem" title="Longest common subsequence problem">longest common subsequence</a>&nbsp;(LCS) distance allows only insertion and deletion, not substitution;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Hamming_distance" title="Hamming distance">Hamming distance</a>&nbsp;allows only substitution, hence, it only applies to strings of the same length.</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Jaro_distance" title="Jaro distance">Jaro distance</a>&nbsp;allows only&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>.</li>
</ul><p>&nbsp;</p><pre><span>use</span> Text<span>::</span>Levenshtein <span>qw</span><span>(</span>distance<span>);</span>

 <span>print</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>"four"</span><span>);</span>
 <span># prints "2"</span>

 <span>my</span> <span>@words</span>     <span>=</span> <span>qw</span><span>/ four foo bar /</span><span>;</span>
 <span>my</span> <span>@distances</span> <span>=</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>@words</span><span>);</span>

 <span>print</span> <span>"@distances"</span><span>;</span>
 <span># prints "2 0 3"</span><br /><br /><br /></pre><pre><span>use</span> Algorithm<span>::</span>LCSS <span>qw</span><span>(</span> LCSS CSS CSS_Sorted <span>);</span>
    <span>my</span> <span>$lcss_ary_ref</span> <span>=</span> <span>LCSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array</span>
    <span>my</span> <span>$lcss_string</span>  <span>=</span> <span>LCSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># string</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>    <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>      <span># ref to array of strings</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># ref to array of strings<br /><br /><br /><br /></span></pre><p>There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshteinXS">Text::LevenshteinXS</a>&nbsp;and&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3AXS">Text::Levenshtein::XS</a>&nbsp;are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.</p><p>The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau">Text::Levenshtein::Damerau</a>&nbsp;defaults to using a pure perl implementation, but if you've installed&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau%3A%3AXS">Text::Levenshtein::Damerau::XS</a>&nbsp;then it will be a lot quicker.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AWagnerFischer">Text::WagnerFischer</a>&nbsp;is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ABrew">Text::Brew</a>&nbsp;is an implementation of the Brew edit distance, which is another algorithm based on edit weights.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy">Text::Fuzzy</a>&nbsp;provides a number of operations for partial or fuzzy matching of text based on edit distance.&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy%3A%3APP">Text::Fuzzy::PP</a>&nbsp;is a pure perl implementation of the same interface.</p><p><a href="http://search.cpan.org/perldoc?String%3A%3ASimilarity">String::Similarity</a>&nbsp;takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ADice">Text::Dice</a>&nbsp;calculates&nbsp;<a href="https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient">Dice's coefficient</a>&nbsp;for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.</p><pre><span>&nbsp;</span></pre>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink='true'>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics#item-annotation-3767</guid>
	<pubDate>Mon, 17 Feb 2020 01:05:34 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics#item-annotation-3767</link>
	<title><![CDATA[Comment by Jit]]></title>
	<description><![CDATA[<p>Nice PDF explaining the concept.</p>
<p>https://web.stanford.edu/class/cs124/lec/med.pdf</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink='true'>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics#item-annotation-3153</guid>
	<pubDate>Fri, 08 Dec 2017 03:26:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics#item-annotation-3153</link>
	<title><![CDATA[Comment by Jit]]></title>
	<description><![CDATA[<p>This one is very nice tutorial</p>
<p>https://web.stanford.edu/class/cs124/lec/med.pdf</p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>