<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/34620?offset=300</link>
	<atom:link href="https://bioinformaticsonline.com/related/34620?offset=300" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44703/the-role-of-lncrna-in-bioinformatics-unlocking-the-secrets-of-the-genome</guid>
	<pubDate>Sat, 07 Dec 2024 02:09:47 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44703/the-role-of-lncrna-in-bioinformatics-unlocking-the-secrets-of-the-genome</link>
	<title><![CDATA[The Role of lncRNA in Bioinformatics: Unlocking the Secrets of the Genome]]></title>
	<description><![CDATA[<p>In the intricate dance of molecular biology, long non-coding RNAs (lncRNAs) have emerged as key players, capturing the interest of researchers worldwide. These RNA molecules, once dismissed as "junk," have proven to be vital in the regulation of gene expression, cellular processes, and the progression of diseases. The intersection of lncRNA studies and bioinformatics is transforming our understanding of these enigmatic molecules, offering profound insights into their structure, function, and therapeutic potential.</p><h3>What Are lncRNAs?</h3><p>lncRNAs are RNA transcripts longer than 200 nucleotides that do not code for proteins. Despite their non-coding nature, they play diverse roles in gene regulation, including chromatin remodeling, transcriptional control, and post-transcriptional processing. Unlike messenger RNAs (mRNAs), lncRNAs often function as scaffolds, decoys, or guides in cellular machinery, influencing biological processes such as cell differentiation, immune response, and even cancer metastasis.</p><h3>Challenges in lncRNA Research</h3><p>Identifying and understanding lncRNAs pose unique challenges:</p><ol>
<li><strong>High Sequence Variability</strong>: Unlike protein-coding genes, lncRNAs exhibit low sequence conservation across species, making functional predictions difficult.</li>
<li><strong>Low Expression Levels</strong>: lncRNAs are often expressed at low levels, complicating their detection in transcriptomic data.</li>
<li><strong>Diverse Functions</strong>: The multifunctional nature of lncRNAs requires advanced computational tools to decipher their roles in complex networks.</li>
</ol><h3>Bioinformatics: A Crucial Ally in lncRNA Research</h3><p>Bioinformatics bridges the gap between raw biological data and meaningful insights, making it indispensable in lncRNA research. Here&rsquo;s how:</p><h4>1. <strong>Identification and Annotation</strong></h4><p>High-throughput sequencing technologies like RNA-seq generate vast amounts of data. Bioinformatics tools such as <em>StringTie</em>, <em>Cufflinks</em>, and <em>HISAT2</em> help assemble and annotate lncRNAs from this data. Additionally, databases like NONCODE, LNCipedia, and Ensembl provide curated repositories of lncRNA sequences and annotations.</p><h4>2. <strong>Functional Prediction</strong></h4><p>Bioinformatics algorithms predict the potential functions of lncRNAs by analyzing their interactions with DNA, RNA, and proteins. Tools like LncRNA2Function and RIblast utilize sequence motifs and secondary structure predictions to hypothesize about the roles of specific lncRNAs.</p><h4>3. <strong>Network Construction</strong></h4><p>lncRNAs often act as regulatory hubs. Bioinformatics platforms such as Cytoscape enable the visualization of lncRNA-mediated networks, elucidating their roles in pathways like cell cycle regulation and apoptosis.</p><h4>4. <strong>Epigenetic Studies</strong></h4><p>lncRNAs are known to interact with chromatin-modifying complexes, influencing gene expression epigenetically. Tools like ChIP-seq and ATAC-seq, combined with computational pipelines, identify these interactions and map them to the genome.</p><h4>5. <strong>Clinical Applications</strong></h4><p>Bioinformatics aids in the discovery of lncRNA biomarkers for diseases like cancer and neurodegenerative disorders. Machine learning models analyze differential expression profiles, helping prioritize lncRNAs with therapeutic potential.</p><h3>Case Study: lncRNAs in Cancer Research</h3><p>lncRNAs such as HOTAIR and MALAT1 have been implicated in cancer progression. Bioinformatics analyses have revealed their roles in promoting metastasis and altering the tumor microenvironment. For example, transcriptome analysis in cancer patients identifies lncRNA expression signatures, enabling precision medicine approaches.</p><h3>Future Directions</h3><p>The fusion of bioinformatics with experimental biology is unlocking the secrets of lncRNAs. Advances in artificial intelligence, single-cell sequencing, and structural modeling promise to overcome current limitations. Here are some promising directions:</p><ul>
<li><strong>Integrative Analysis</strong>: Combining multi-omics data to understand the interplay of lncRNAs with other biomolecules.</li>
<li><strong>CRISPR Screens</strong>: Leveraging bioinformatics to design CRISPR-based functional screens for lncRNAs.</li>
<li><strong>Therapeutic Development</strong>: Using bioinformatics to design lncRNA-based therapeutics, including antisense oligonucleotides and RNA interference tools.</li>
</ul><h3>Conclusion</h3><p>lncRNAs are the hidden gems of the genome, and bioinformatics is the key to unearthing their full potential. As research progresses, lncRNAs could pave the way for novel diagnostics, targeted therapies, and personalized medicine, revolutionizing our approach to complex diseases.</p><p>The journey into the world of lncRNAs is only beginning, and bioinformatics will continue to play a pivotal role in decoding these molecular mysteries. Whether you&rsquo;re a researcher, clinician, or bioinformatics enthusiast, the study of lncRNAs offers a fascinating frontier of discovery.</p>]]></description>
	<dc:creator>LEGE</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/44766/genome-simulation-with-slim-and-msprime</guid>
	<pubDate>Fri, 31 Jan 2025 12:47:43 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/44766/genome-simulation-with-slim-and-msprime</link>
	<title><![CDATA[Genome Simulation with SLiM and msprime]]></title>
	<description><![CDATA[<p>Genome simulation is an essential tool in population genetics, enabling researchers to model evolutionary processes and study genetic variation. Two widely used simulation tools in this field are <strong style="font-size: 12.8px;">SLiM</strong><span style="font-size: 12.8px; font-weight: normal;"> and </span><strong style="font-size: 12.8px;">msprime</strong><span style="font-size: 12.8px; font-weight: normal;">. While both serve different purposes, they can be used together with the </span><strong style="font-size: 12.8px;">slendr</strong><span style="font-size: 12.8px; font-weight: normal;"> framework to compare simulation outputs effectively.</span></p><h2>Overview of SLiM and msprime</h2><h3>SLiM: Forward Genetic Simulator</h3><p>SLiM is a <strong>free, open-source</strong> tool designed for forward genetic simulations. It allows researchers to model complex evolutionary scenarios, including selection, recombination, and demographic events, making it particularly useful for studying adaptation and selection in populations.</p><p><strong>Key Features of SLiM:</strong></p><ul>
<li>
<p>Simulates population evolution forward in time</p>
</li>
<li>
<p>Supports custom evolutionary models using an embedded scripting language</p>
</li>
<li>
<p>Allows modeling of spatial and ecological dynamics</p>
</li>
<li>
<p>Provides high flexibility and extensibility for user-defined scenarios</p>
</li>
<li>
<p>Available on GitHub as an open-source project</p>
</li>
</ul><h3>msprime: Ancestry and Mutation Simulator</h3><p>msprime is an efficient, <strong>open-source</strong> tool that simulates ancestry and mutations using a coalescent framework. It is known for its high-speed performance and low memory requirements, making it a popular choice for large-scale genomic simulations.</p><p><strong>Key Features of msprime:</strong></p><ul>
<li>
<p>Implements coalescent simulations for ancestry modeling</p>
</li>
<li>
<p>Efficiently simulates large population histories</p>
</li>
<li>
<p>Supports the addition of mutations to genealogies</p>
</li>
<li>
<p>Developed using an open-source community model</p>
</li>
<li>
<p>Often faster and more memory-efficient than alternative simulators</p>
</li>
</ul><h2>Using SLiM and msprime with slendr</h2><p>Both SLiM and msprime can be integrated with <strong>slendr</strong>, a framework that facilitates structured population genetic simulations. This integration allows for seamless comparison of simulation outputs.</p><h3>How They Work Together:</h3><ul>
<li>
<p>SLiM and msprime simulations can be analyzed within slendr.</p>
</li>
<li>
<p>The <strong>ts_read()</strong> function in slendr enables loading and comparing tree sequence outputs from both simulators.</p>
</li>
<li>
<p>This integration allows researchers to validate simulation results and gain deeper insights into evolutionary processes.</p>
</li>
</ul><h2>Performance Considerations</h2><p>While SLiM offers powerful forward simulations with extensive customization, msprime is often preferred for its <strong>speed and memory efficiency</strong> when simulating ancestry and mutations. The choice between the two depends on the research goals:</p><ul>
<li>
<p><strong>For detailed evolutionary modeling with selection and recombination:</strong> Use SLiM.</p>
</li>
<li>
<p><strong>For large-scale coalescent simulations with mutations:</strong> Use msprime.</p>
</li>
<li>
<p><strong>For comparing different simulation models and their outputs:</strong> Use slendr to integrate SLiM and msprime results.</p>
</li>
</ul><h2>Conclusion</h2><p>SLiM and msprime are valuable tools for genome simulation, each serving distinct but complementary purposes in population genetics research. By leveraging the strengths of both simulators with slendr, researchers can conduct robust and efficient evolutionary simulations, enhancing our understanding of genetic diversity and adaptation.</p><p>For more information, check out the official GitHub repositories for <strong>SLiM</strong> and <strong>msprime</strong>, and explore the <strong>slendr</strong> framework for streamlined simulation workflow</p>]]></description>
	<dc:creator>BioStar</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</guid>
	<pubDate>Mon, 10 Jul 2017 05:56:07 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/33847/omega2-metagenome-assembly-pipeline</link>
	<title><![CDATA[Omega2: metagenome assembly pipeline]]></title>
	<description><![CDATA[<p><span>Omega found overlaps between reads using a prefix/suffix hash table. The overlap graph of reads was simplified by removing transitive edges and trimming short branches. Unitigs were generated based on minimum cost flow analysis of the overlap graph and then merged to contigs and scaffolds using mate-pair information. In comparison with three de Bruijn graph assemblers (SOAPdenovo, IDBA-UD and MetaVelvet), Omega provided comparable overall performance on a HiSeq 100-bp dataset and superior performance on a MiSeq 300-bp dataset. In comparison with Celera on the MiSeq dataset, Omega provided more continuous assemblies overall using a fraction of the computing time of existing overlap-layout-consensus assemblers. This indicates Omega can more efficiently assemble longer Illumina reads, and at deeper coverage, for metagenomic datasets.</span></p><p>Address of the bookmark: <a href="http://omega.omicsbio.org/" rel="nofollow">http://omega.omicsbio.org/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</guid>
	<pubDate>Thu, 07 Dec 2017 08:46:51 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/34552/edit-distance-application-in-bioinformatics</link>
	<title><![CDATA[Edit distance application in bioinformatics !]]></title>
	<description><![CDATA[<p>There are other popular measures of&nbsp;<a href="https://en.wikipedia.org/wiki/Edit_distance" title="Edit distance">edit distance</a>, which are calculated using a different set of allowable edit operations. For instance,</p><ul>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance" title="Damerau&ndash;Levenshtein distance">Damerau&ndash;Levenshtein distance</a>&nbsp;allows insertion, deletion, substitution, and the&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>&nbsp;of two adjacent characters;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Longest_common_subsequence_problem" title="Longest common subsequence problem">longest common subsequence</a>&nbsp;(LCS) distance allows only insertion and deletion, not substitution;</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Hamming_distance" title="Hamming distance">Hamming distance</a>&nbsp;allows only substitution, hence, it only applies to strings of the same length.</li>
<li>the&nbsp;<a href="https://en.wikipedia.org/wiki/Jaro_distance" title="Jaro distance">Jaro distance</a>&nbsp;allows only&nbsp;<a href="https://en.wikipedia.org/wiki/Transposition_(mathematics)" title="Transposition (mathematics)">transposition</a>.</li>
</ul><p>&nbsp;</p><pre><span>use</span> Text<span>::</span>Levenshtein <span>qw</span><span>(</span>distance<span>);</span>

 <span>print</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>"four"</span><span>);</span>
 <span># prints "2"</span>

 <span>my</span> <span>@words</span>     <span>=</span> <span>qw</span><span>/ four foo bar /</span><span>;</span>
 <span>my</span> <span>@distances</span> <span>=</span> <span>distance</span><span>(</span><span>"foo"</span><span>,</span><span>@words</span><span>);</span>

 <span>print</span> <span>"@distances"</span><span>;</span>
 <span># prints "2 0 3"</span><br /><br /><br /></pre><pre><span>use</span> Algorithm<span>::</span>LCSS <span>qw</span><span>(</span> LCSS CSS CSS_Sorted <span>);</span>
    <span>my</span> <span>$lcss_ary_ref</span> <span>=</span> <span>LCSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array</span>
    <span>my</span> <span>$lcss_string</span>  <span>=</span> <span>LCSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># string</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>    <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>      <span># ref to array of strings</span>
    <span>my</span> <span>$css_ary_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>\</span><span>@SEQ1</span><span>,</span> <span>\</span><span>@SEQ2</span> <span>);</span>  <span># ref to array of arrays</span>
    <span>my</span> <span>$css_str_ref</span> <span>=</span> <span>CSS_Sorted</span><span>(</span> <span>$STR1</span><span>,</span> <span>$STR2</span> <span>);</span>    <span># ref to array of strings<br /><br /><br /><br /></span></pre><p>There are many different modules on CPAN for calculating the edit distance between two strings. Here's just a selection.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshteinXS">Text::LevenshteinXS</a>&nbsp;and&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3AXS">Text::Levenshtein::XS</a>&nbsp;are both versions of the Levenshtein algorithm that require a C compiler, but will be a lot faster than this module.</p><p>The Damerau-Levenshtein edit distance is like the Levenshtein distance, but in addition to insertion, deletion and substitution, it also considers the transposition of two adjacent characters to be a single edit. The module&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau">Text::Levenshtein::Damerau</a>&nbsp;defaults to using a pure perl implementation, but if you've installed&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3ALevenshtein%3A%3ADamerau%3A%3AXS">Text::Levenshtein::Damerau::XS</a>&nbsp;then it will be a lot quicker.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AWagnerFischer">Text::WagnerFischer</a>&nbsp;is an implementation of the Wagner-Fischer edit distance, which is similar to the Levenshtein, but applies different weights to each edit type.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ABrew">Text::Brew</a>&nbsp;is an implementation of the Brew edit distance, which is another algorithm based on edit weights.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy">Text::Fuzzy</a>&nbsp;provides a number of operations for partial or fuzzy matching of text based on edit distance.&nbsp;<a href="http://search.cpan.org/perldoc?Text%3A%3AFuzzy%3A%3APP">Text::Fuzzy::PP</a>&nbsp;is a pure perl implementation of the same interface.</p><p><a href="http://search.cpan.org/perldoc?String%3A%3ASimilarity">String::Similarity</a>&nbsp;takes two strings and returns a value between 0 (meaning entirely different) and 1 (meaning identical). Apparently based on edit distance.</p><p><a href="http://search.cpan.org/perldoc?Text%3A%3ADice">Text::Dice</a>&nbsp;calculates&nbsp;<a href="https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient">Dice's coefficient</a>&nbsp;for two strings. This formula was originally developed to measure the similarity of two different populations in ecological research.</p><pre><span>&nbsp;</span></pre>]]></description>
	<dc:creator>Neel</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37830/nquire-a-statistical-framework-for-ploidy-estimation-using-next-generation-sequencing</guid>
	<pubDate>Thu, 04 Oct 2018 05:23:59 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37830/nquire-a-statistical-framework-for-ploidy-estimation-using-next-generation-sequencing</link>
	<title><![CDATA[nQuire: a statistical framework for ploidy estimation using next generation sequencing]]></title>
	<description><![CDATA[<p>nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuireunder the MIT license.</p><p>Address of the bookmark: <a href="https://github.com/clwgg/nQuireunder" rel="nofollow">https://github.com/clwgg/nQuireunder</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</guid>
	<pubDate>Thu, 16 Jun 2016 17:55:17 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27845/cnidaria-fast-reference-free-phylogenomic-clustering</link>
	<title><![CDATA[CNIDARIA: fast, reference-free phylogenomic clustering]]></title>
	<description><![CDATA[<p>Motivation: Identification of biological specimens is a major requirement for a range of applications. Reference-free methods analyse unprocessed sequencing data without relying on prior knowledge, but these do not scale to arbitrarily large genomes and arbitrarily large phylogenetic distances.</p>
<p>Results: We present Cnidaria, a practical tool for clustering genomic and transcriptomic data with no limitation on ge-nome size or phylogenetic distances. We successfully simultaneously clustered 169 genomic and transcriptomic datasets from 4 kingdoms, achieving 100% accuracy at supra-species level and 78% accuracy for species level.</p>
<p>Availability and Implementation: Cnidaria is written in C++ and Python and is available at http://www.ab.wur.nl/cnidaria.</p>
<p>Contact: Saulo Aflitos - sauloal@gmail.com</p>
<p>Supplementary information: Supplementary data are available at Bioinformatics online.</p><p>Address of the bookmark: <a href="https://github.com/sauloal/cnidaria/wiki" rel="nofollow">https://github.com/sauloal/cnidaria/wiki</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Tue, 02 Apr 2019 21:54:55 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39213/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>BioJoker</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/41033/clark-fast-accurate-and-versatile-sequence-classification-system</guid>
	<pubDate>Sat, 15 Feb 2020 01:49:01 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/41033/clark-fast-accurate-and-versatile-sequence-classification-system</link>
	<title><![CDATA[CLARK: Fast, accurate and versatile sequence classification system]]></title>
	<description><![CDATA[<p><span></span><a href="http://dx.doi.org/10.1186/s12864-015-1419-2"><strong>CLARK</strong></a><span>, a method based on a supervised sequence classification using discriminative&nbsp;</span><em>k</em><span>-mers. Considering two distinct specific classification problems (see the article for details), namely (1) the taxonomic classification of metagenomic reads to known bacterial genomes, and (2) the assignment of BAC clones and transcript to chromosome arms/centromeres (in the absence of a finished assembly for the reference genome), CLARK outperforms in classification speed and precision the best state-of-the-art methods.</span></p>
<p><span><a href="http://clark.cs.ucr.edu/Spaced/">http://clark.cs.ucr.edu/Spaced/</a></span></p><p>Address of the bookmark: <a href="http://clark.cs.ucr.edu/Spaced/" rel="nofollow">http://clark.cs.ucr.edu/Spaced/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</guid>
	<pubDate>Tue, 12 Dec 2017 17:23:31 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34618/mashmap-a-fast-and-approximate-software-for-mapping-long-reads-pacbioont-or-assembly-to-reference-genomes</link>
	<title><![CDATA[MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)]]></title>
	<description><![CDATA[<p><span>MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a&nbsp;</span><em>k</em><span>-mer based&nbsp;</span><a href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a><span>&nbsp;using a combination of&nbsp;</span><a href="http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p76-schleimer.pdf">Winnowing</a><span>&nbsp;and&nbsp;</span><a href="https://en.wikipedia.org/wiki/MinHash">MinHash</a><span>. This is then converted to an estimate of sequence identity using the&nbsp;</span><a href="http://mash.readthedocs.org/">Mash</a><span>&nbsp;distance. An appropriate&nbsp;</span><em>k</em><span>-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.</span></p><p>Address of the bookmark: <a href="https://github.com/marbl/MashMap" rel="nofollow">https://github.com/marbl/MashMap</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36618/lamsa-fast-split-read-alignment-with-long-approximate-matches</guid>
	<pubDate>Tue, 15 May 2018 04:44:42 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36618/lamsa-fast-split-read-alignment-with-long-approximate-matches</link>
	<title><![CDATA[LAMSA: fast split read alignment with long approximate matches]]></title>
	<description><![CDATA[LAMSA (Long Approximate Matches-based Split Aligner) is a novel split alignment approach with faster speed and good ability of handling SV events. It is well-suited to align long reads (over thousands of base-pairs).

LAMSA takes takes the advantage of the rareness of SVs to implement a specifically designed two-step strategy. That is, LAMSA initially splits the read into relatively long fragments and co-linearly align them to solve the small variations or sequencing errors, and mitigate the effect of repeats. The alignments of the fragments are then used for implementing a sparse dynamic programming (SDP)-based split alignment approach to handle the large or non-co-linear variants.

We benchmarked LAMSA with simulated and real datasets having various read lengths and sequencing error rates, the results demonstrate that it is substantially faster than the state-of-the-art long read aligners; mean-while, it also has good ability to handle various categories of SVs.

LAMSA is open source and free for non-commercial use.

LAMSA is mainly designed by Bo Liu &amp; Yan Gao and developed by Yan Gao in Center for Bioinformatics, Harbin Institute of Technology, China.<p>Address of the bookmark: <a href="https://github.com/hitbc/LAMSA" rel="nofollow">https://github.com/hitbc/LAMSA</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>

</channel>
</rss>