<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/39017?offset=80</link>
	<atom:link href="https://bioinformaticsonline.com/related/39017?offset=80" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</guid>
	<pubDate>Wed, 23 May 2018 06:54:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36739/blasr-mapping-single-molecule-sequencing-reads-using-basic-local-alignment-with-successive-refinement-blasr-theory-and-application</link>
	<title><![CDATA[BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,]]></title>
	<description><![CDATA[<p><span>BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.</span></p>
<p>Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The &ldquo;target.fasta.sa&rdquo; is the suffix array from &ldquo;target.fasta&rdquo; generated by sawriter.</p>
<blockquote>
<p>blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15</p>
</blockquote>
<p>the output format option &ldquo;-m 4&Prime; generate the alignment coordinate. Not fully documented, but I can explain that to you.&nbsp;</p>
<p>I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.</p><p>Address of the bookmark: <a href="http://bix.ucsd.edu/projects/blasr/" rel="nofollow">http://bix.ucsd.edu/projects/blasr/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/34038/quota-synteny-alignment</guid>
	<pubDate>Mon, 31 Jul 2017 04:11:57 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/34038/quota-synteny-alignment</link>
	<title><![CDATA[Quota synteny alignment]]></title>
	<description><![CDATA[<p><span>Typically in comparative genomics, we can identify anchors, chain them into syntenic blocks and interpret these blocks as derived from a common descent. However, when comparing two genomes undergone ancient genome duplications (plant genomes in particular), we have large number of blocks that are not orthologous, but are paralogous. This has forced us sometimes to use&nbsp;</span><em>ad-hoc</em><span>&nbsp;rules to screen these blocks. So the question is:&nbsp;</span><span>given the expected depth (quota) along both x- and y-axis, select a subset of the anchors with maximized total score</span><span>.</span></p><p>Address of the bookmark: <a href="https://github.com/tanghaibao/quota-alignment" rel="nofollow">https://github.com/tanghaibao/quota-alignment</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36512/hisat2-a-fast-and-sensitive-alignment-program-for-mapping-next-generation-sequencing-reads</guid>
	<pubDate>Tue, 08 May 2018 04:27:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36512/hisat2-a-fast-and-sensitive-alignment-program-for-mapping-next-generation-sequencing-reads</link>
	<title><![CDATA[HISAT2: a fast and sensitive alignment program for mapping next-generation sequencing reads]]></title>
	<description><![CDATA[<p><strong>HISAT2</strong><span>&nbsp;is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs&nbsp;</span><a href="http://dl.acm.org/citation.cfm?id=2674828">[Sir&eacute;n et al. 2014]</a><span>, we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).&nbsp;</span></p>
<p><span>more at&nbsp;https://ccb.jhu.edu/software/hisat2/index.shtml</span></p><p>Address of the bookmark: <a href="https://github.com/infphilo/hisat2" rel="nofollow">https://github.com/infphilo/hisat2</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36846/gblocks-eliminates-poorly-aligned-positions-and-divergent-regions-of-a-dna-or-protein-alignment</guid>
	<pubDate>Sat, 02 Jun 2018 07:36:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36846/gblocks-eliminates-poorly-aligned-positions-and-divergent-regions-of-a-dna-or-protein-alignment</link>
	<title><![CDATA[Gblocks: eliminates poorly aligned positions and divergent regions of a DNA or protein alignment]]></title>
	<description><![CDATA[<p><a href="http://molevol.cmima.csic.es/castresana/Gblocks.html">Gblocks</a><span>&nbsp;eliminates poorly aligned positions and divergent regions of a DNA or protein alignment so that it becomes more suitable for phylogenetic analysis. This server implements the most important features of the Gblocks program to make its use as simple as possible without loosing the functionality that it is necessary in most of the cases. Other options can be changed in the stand-alone program. You can see here an&nbsp;</span><a href="http://molevol.cmima.csic.es/castresana/Gblocks_server/nad3.pir-gb.htm">example output file</a><span>&nbsp;showing the blocks selected from a protein alignment. Further information can be found in the&nbsp;</span><a href="http://molevol.cmima.csic.es/castresana/Gblocks/Gblocks_documentation.html">online documentation</a><span>.&nbsp;</span></p><p>Address of the bookmark: <a href="http://molevol.cmima.csic.es/castresana/Gblocks_server.html" rel="nofollow">http://molevol.cmima.csic.es/castresana/Gblocks_server.html</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/37584/mulan-multiple-sequence-local-alignment-and-visualization-for-studying-function-and-evolution</guid>
	<pubDate>Fri, 24 Aug 2018 09:50:01 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/37584/mulan-multiple-sequence-local-alignment-and-visualization-for-studying-function-and-evolution</link>
	<title><![CDATA[Mulan: Multiple-sequence local alignment and visualization for studying function and evolution]]></title>
	<description><![CDATA[<p>Mulan: Multiple-sequence local alignment and visualization for studying function and evolution</p>
<p><span>Mulan (</span><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540288/#ref44">http://mulan.dcode.org/</a><span>), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA.</span></p><p>Address of the bookmark: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540288/" rel="nofollow">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC540288/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/40711/vg-variation-graph-data-structures-interchange-formats-alignment-genotyping-and-variant-calling-methods</guid>
	<pubDate>Tue, 28 Jan 2020 03:53:24 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/40711/vg-variation-graph-data-structures-interchange-formats-alignment-genotyping-and-variant-calling-methods</link>
	<title><![CDATA[VG: variation graph data structures, interchange formats, alignment, genotyping, and variant calling methods]]></title>
	<description><![CDATA[<p><em>Variation graphs</em>&nbsp;provide a succinct encoding of the sequences of many genomes. A variation graph (in particular as implemented in vg) is composed of:</p>
<ul>
<li><em>nodes</em>, which are labeled by sequences and ids</li>
<li><em>edges</em>, which connect two nodes via either of their respective ends</li>
<li><em>paths</em>, describe genomes, sequence alignments, and annotations (such as gene models and transcripts) as walks through nodes connected by edges</li>
</ul><p>Address of the bookmark: <a href="https://github.com/vgteam/vg" rel="nofollow">https://github.com/vgteam/vg</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/44481/unialigner-a-parameter-free-framework-for-fast-sequence-alignment</guid>
	<pubDate>Fri, 08 Mar 2024 23:36:12 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/44481/unialigner-a-parameter-free-framework-for-fast-sequence-alignment</link>
	<title><![CDATA[UniAligner: a parameter-free framework for fast sequence alignment]]></title>
	<description><![CDATA[<p>UniAligner (formerly, TandemAligner) is the first parameter-free algorithm for sequence alignment that introduces a sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. Classical alignment approaches, such as the Smith-Waterman algorithm, that work well for most sequences, fail to construct biologically adequate alignments of extra-long tandem repeats (ETRs), such as human centromeres and immunoglobulin loci. This limitation was overlooked in the previous studies since the sequences of the centromeres and other ETRs across multiple genomes only became available recently.</p>
<p>More at https://www.nature.com/articles/s41592-023-01970-4</p><p>Address of the bookmark: <a href="https://github.com/seryrzu/unialigner" rel="nofollow">https://github.com/seryrzu/unialigner</a></p>]]></description>
	<dc:creator>Abhi</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/35041/seal-sequence-alignment-evaluation-suite</guid>
	<pubDate>Wed, 03 Jan 2018 05:05:46 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/35041/seal-sequence-alignment-evaluation-suite</link>
	<title><![CDATA[Seal: SEquence ALignment evaluation suite]]></title>
	<description><![CDATA[<p><span>Seal</span>&nbsp;is a comprehensive sequencing simulation and alignment tool evaluation suite. This software (implemented in Java) provides several utilities that can be used to evaluate alignment algorithms, including:</p>
<ul>
<li>Reading a pre-existing reference genome from one or more FASTA files.</li>
<li>Alternatively, generating an artificial reference genome based on input parameters (length, repeat count, repeat length, repeat variability rate).</li>
<li>Simulating reads from random locations in the genome based on input parameters of read length, coverage, sequencing error rate, and indel rate.</li>
<li>Applying alignment tools to the genome and the reads through a standardized interface.</li>
<li>Parsing the output of the alignment tool and calculating the number of reads that were correctly or incorrectly mapped.</li>
<li>Computing run times and measures of accuracy.</li>
</ul>
<p><span>Seal</span>&nbsp;has interfaces to evaluate the following software packages:</p>
<ul>
<li>Bowtie</li>
<li>BWA</li>
<li>MAQ</li>
<li>mrFAST</li>
<li>mrsFAST</li>
<li>Novoalign</li>
<li>SHRiMP</li>
<li>SOAPv2</li>
</ul><p>Address of the bookmark: <a href="http://compbio.case.edu/seal/" rel="nofollow">http://compbio.case.edu/seal/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/36603/learning-python-programming-a-bioinformatician-perspective</guid>
	<pubDate>Mon, 14 May 2018 16:33:03 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/36603/learning-python-programming-a-bioinformatician-perspective</link>
	<title><![CDATA[Learning Python Programming - a bioinformatician perspective !]]></title>
	<description><![CDATA[<p>Python Programming&nbsp;is a general purpose programming language that is open source, flexible, powerful and easy to use. One of the most important features of python is its rich set of utilities and libraries for data processing and analytics tasks. In the current era of big biological data, python and biopython is getting more popularity due to its easy-to-use features which supports big data processing.</p><p>In this tutorial series article, I will explore features and packages of python which are widely used in the big data, NGS, and bioinformatics. I will also walk through a real biological example which shows NGS data processing with the help of python packages and programming.</p><p>Python has a couple of points to recommend it to biologists and scientists specifically:</p><ul>
<li>It's widely used in the scientific community</li>
<li>It has a couple of very well designed libraries for doing complex scientific computing (although we won't encounter them in this book)</li>
<li>It lend itself well to being integrated with other, existing tools</li>
<li>It has features which make it easy to manipulate strings of characters (for example, strings of DNA bases and protein amino acid residues, which we as biologists are particularly fond of)</li>
</ul><p>In general, following are some of the important features of python which makes it a perfect fit for rapid application development.</p><ul>
<li>Python is interpreted language so the program does not need to be compiled. Interpreter parses the program code and generates the output.</li>
<li>Python is dynamically typed, so the variables types are defined automatically.</li>
<li>Python is strongly typed. So the developers need to cast the type manually.</li>
<li>Less code and more use makes it more acceptable.</li>
<li>Python is portable, extendable and scalable.</li>
</ul><p>There are two major Python versions, Python 2 and Python 3. Python 2 and 3 are quite different. This tutorial uses Python 3, because it more semantically correct and supports newer features.</p><p>I will post tutorial on daily basis on this page. Check the sub-pages on right side.</p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36518/mix-combining-multiple-assemblies-from-ngs-data</guid>
	<pubDate>Tue, 08 May 2018 04:58:05 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36518/mix-combining-multiple-assemblies-from-ngs-data</link>
	<title><![CDATA[MIX: Combining multiple assemblies from NGS data]]></title>
	<description><![CDATA[<p>Mix is a tool that combines two or more draft assemblies, without relying on a reference genome and has the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a path in the extension graph that maximizes the cumulative contig length.</p>
<p>The Mix algorithm, approach and results were published in BMC bioinformatics :&nbsp;<a href="http://www.biomedcentral.com/1471-2105/14/S15/S16">http://www.biomedcentral.com/1471-2105/14/S15/S16</a>.</p><p>Address of the bookmark: <a href="https://github.com/cbib/MIX" rel="nofollow">https://github.com/cbib/MIX</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>

</channel>
</rss>