<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/32709?offset=380</link>
	<atom:link href="https://bioinformaticsonline.com/related/32709?offset=380" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</guid>
	<pubDate>Mon, 13 Jun 2016 05:47:21 -0500</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/27799/bbmapbbtools-package-multipurpose-tool-designed-for-converting-reads-or-other-nucleotide-data-between-different-formats</link>
	<title><![CDATA[BBMap/BBTools package: Multipurpose tool designed for converting reads or other nucleotide data between different formats.]]></title>
	<description><![CDATA[<div id="post_message_148585"><a href="https://sourceforge.net/projects/bbmap/" target="_blank">Reformat</a>is a member of the <a href="https://sourceforge.net/projects/bbmap/" target="_blank">BBMap/BBTools package</a>. It is a multipurpose tool designed for converting reads or other nucleotide data between different formats. It supports, and can inter-convert:<br /> <br /> fastq<br /> fasta<br /> fasta+qual<br /> sam<br /> scarf (an old Illumina format)<br /> bam (if samtools is installed)<br /> gzip<br /> zip<br /> ascii-33 (sanger)<br /> ascii-64 (old Illumina)<br /> paired files<br /> interleaved files<br /> <br /> It is multithreaded and can process data at over 500 megabytes per second, and can accept streams from standard in and write to standard out, allowing it to be easily dropped into the middle of a pipeline for format conversion. Reformat autodetects formats based on file extensions and content, making it very easy to use; and the autodetection can be overridden, allowing flexibility for people who don't like to follow naming conventions, or out-of-spec fastq files with qualities values like -17 or 120.<br /> <br /> The program has been gradually expanded, and can now perform various other functions. None of these will break pairing, if the input is paired.<br /> <br /> Quality trimming (either or both ends)<br /> Quality filtering<br /> Fixed-length trimming<br /> Generation of histograms (base composition, quality, etc)<br /> Subsampling (to a fraction of input reads, or an exact number of reads or bases)<br /> Changing fasta line-wrapping length<br /> Reverse-complementing (all reads or only read 2)<br /> Adding /1 and /2 suffix to read names<br /> GC-content filtering<br /> Length-filtering<br /> Testing for corrupted interleaved files<br /> <br /> Reformat is compatible with any platform that supports Java 1.7 or higher. It also has a bash shellscript for simpler invocation. Typical usage examples:<br /> <br /> Reformat fastq into fasta:<br /> <strong>reformat.sh in=x.fq out=y.fa</strong><br /> <br /> Interleave paired reads:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out=y.fq</strong><br /> <br /> Note - you can actually use a shortcut if paired read files have the same name with a 1 and a 2. This is equivalent to the above command:<br /> <strong>reformat.sh in=x#.fq out=y.fq</strong><br /> <br /> De-interleave reads:<br /> <strong>reformat.sh in=x.fq out1=y1.fq out2=y2.fq</strong><br /> <br /> Verify that interleaving appears correct, assuming Illumina namimg conventions:<br /> <strong>reformat.sh in=x.fq vint</strong><br /> <br /> Convert ASCII-33 to ASCII-64:<br /> <strong>reformat.sh in=x.fq out=y.fq qin=33 qout=64</strong><br /> <br /> Quality-trim paired reads to Q10 on the left and right ends and discard reads shorter than 50bp after trimming:<br /> <strong>reformat.sh in1=x1.fq in2=x2.fq out1=y1.fq out2=y2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlength=50</strong><br /> <br /> Subsample 10% of the first 20000 pairs in an interleaved file:<br /> <strong>reformat.sh in=x.fq out=y.fq reads=20000 samplerate=0.1 int=t</strong><br /> (in this case "int=t" overrides interleaving autodetection, to ensure reads are treated as pairs)<br /> <br /> Pipe in a gzipped sam file and pipe out fasta:<br /> <strong>reformat.sh in=stdin.sam.gz out=stdout.fa</strong><br /> <br /> Reverse-complement reads:<br /> <strong>reformat.sh in=x.fq out=y.fq rcomp</strong><br /> <br /> For reformatting a file with very long sequences, Reformat will need more memory; just add the additional flag "-Xmx2g". For example, to change the line-wrapping length on the human genome (which has individual sequences over 200Mbp long) to 70 characters:<br /> <strong>reformat.sh -Xmx2g in=HG19.fa.gz out=HG19_wrapped.fa.gz fastawrap=70</strong><br /> <br /> For additional functions, please run the shellscript with no arguments, or just read it with a text editor. If you have any questions, please post them in this thread.<br /> <br /> For people using a non-bash terminal, you may need to type "bash reformat.sh" instead of just "reformat.sh".<br /> For users of Windows or other platforms that do not support bash shellscripts, replace "reformat.sh" with "java -ea -Xmx200m /path/to/bbmap/current/ jgi.ReformatReads"<br /> for example,<br /> <strong>java -ea -Xmx200m C:\bbmap\current\ jgi.ReformatReads in=x.fq out=y.fa</strong><br /> <br /> Reformat can be downloaded with BBTools here:<br /> <a href="https://sourceforge.net/projects/bbmap/" target="_blank">https://sourceforge.net/projects/bbmap/</a></div>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/27839/lorma-a-tool-for-correcting-sequencing-errors-in-long-reads-such-those-produced-by-pacific-biosciences-sequencing-machines</guid>
	<pubDate>Wed, 15 Jun 2016 17:18:36 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/27839/lorma-a-tool-for-correcting-sequencing-errors-in-long-reads-such-those-produced-by-pacific-biosciences-sequencing-machines</link>
	<title><![CDATA[LoRMA: a tool for correcting sequencing errors in long reads such those produced by Pacific Biosciences sequencing machines]]></title>
	<description><![CDATA[<p>LoRMA is a tool for correcting sequencing errors in long reads such those produced by Pacific Biosciences sequencing machines.</p>
<p>Publication:</p>
<ul>
<li>L. Salmela, R. Walve, E. Rivals, and E. Ukkonen: Accurate selfcorrection of errors in long reads using de Bruijn graphs. Accepted to RECOMB-Seq 2016.</li>
</ul>
<p>Download:</p>
<ul>
<li><a href="https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/LoRMA-0.3.tar.gz">LoRMA 0.3 source files</a></li>
<li><a href="https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/README.txt">README</a></li>
</ul><p>Address of the bookmark: <a href="https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/" rel="nofollow">https://www.cs.helsinki.fi/u/lmsalmel/LoRMA/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28168/sam-flags</guid>
	<pubDate>Wed, 29 Jun 2016 15:38:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28168/sam-flags</link>
	<title><![CDATA[SAM flags]]></title>
	<description><![CDATA[<p>Decoding SAM flags</p>
<p>This utility makes it easy to identify what are the properties of a read based on its SAM flag value, or conversely, to find what the SAM Flag value would be for a given combination of properties.</p>
<p>To decode a given SAM flag value, just enter the number in the field below. The encoded properties will be listed under Summary below, to the right.</p><p>Address of the bookmark: <a href="https://broadinstitute.github.io/picard/explain-flags.html" rel="nofollow">https://broadinstitute.github.io/picard/explain-flags.html</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28121/kaiju</guid>
	<pubDate>Mon, 27 Jun 2016 11:23:04 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28121/kaiju</link>
	<title><![CDATA[Kaiju]]></title>
	<description><![CDATA[<p>Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.</p>
<p>By default, Kaiju uses either the available complete genomes from NCBI RefSeq or the microbial subset of the non-redundant protein database <em>nr</em> used by NCBI BLAST, optionally also including fungi and microbial eukaryotes.</p>
<p>Kaiju translates reads into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment. The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a protein database comprising 4821 microbial genomes. Kaiju can also be used for querying any other protein database without taxonomic classification, using either protein or nucleotide queries.</p>
<p>Kaiju is described in <a href="http://www.nature.com/ncomms/2016/160413/ncomms11257/full/ncomms11257.html">Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. <em>Nat. Commun.</em> 7:11257</a> (open access).</p><p>Address of the bookmark: <a href="http://kaiju.binf.ku.dk/" rel="nofollow">http://kaiju.binf.ku.dk/</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/28415/scarpa</guid>
	<pubDate>Wed, 13 Jul 2016 07:59:25 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/28415/scarpa</link>
	<title><![CDATA[Scarpa]]></title>
	<description><![CDATA[<p><strong>Scarpa</strong>&nbsp;is a stand-alone scaffolding tool for NGS data. It can be used together with virtually any genome assembler and any NGS read mapper that supports SAM format. Other features include support for multiple libraries and an option to estimate insert size distributions from data. Scarpa is available free of charge for academic and commercial use under the GNU General Public License (GPL).</p>
<p>See the&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/hapsembler-2.21_manual.pdf">user manual</a>&nbsp;or the&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/scarpa_paper.pdf">paper</a>&nbsp;for more information about Scarpa. Click&nbsp;<a href="http://compbio.cs.toronto.edu/hapsembler/ScarpaSupplementary.pdf">here</a>&nbsp;for the supplementary material.</p><p>Address of the bookmark: <a href="http://compbio.cs.toronto.edu/hapsembler/scarpa.html" rel="nofollow">http://compbio.cs.toronto.edu/hapsembler/scarpa.html</a></p>]]></description>
	<dc:creator>Poonam Mahapatra</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36865/perga-a-paired-end-read-guided-de-novo-assembler-for-extending-contigs-using-svm-and-look-ahead-approach</guid>
	<pubDate>Tue, 05 Jun 2018 09:57:11 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36865/perga-a-paired-end-read-guided-de-novo-assembler-for-extending-contigs-using-svm-and-look-ahead-approach</link>
	<title><![CDATA[PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM and Look Ahead Approach]]></title>
	<description><![CDATA[PERGA - Paired End Reads Guided Assembler

PERGA is a novel sequence reads guided de novo assembly approach which adopts greedy-like prediction strategy for assembling reads to contigs and scaffolds. Instead of using single-end reads to construct contig, PERGA uses paired-end reads and different read overlap sizes from O ≥ Omax to Omin to resolve the gaps and branches. Moreover, by constructing a decision model using machine learning approach based on branch features, PERGA can determine the correct extension in 99.7% of cases. PERGA will try to extend the contigs by all feasible nucleotides and determine if these multiple extensions due to sequencing errors or repeats by using looking ahead technology, and it also try to separate the different repeats of nearby genomic regions to make the assembly result more longer and accurate.

The simulated E.coli paired-end reads data are generated using GemSim (KE McElroy, F Luciani, T Thomas. Gemsim: General, Error-Model Based Simulator of Next-Generation Sequencing Data. BMC Genomics 2012, 13:74), with coverage 50x, 60x, 100x, read lengths 100-bp, and can be downloaded from https://github.com/zhuxiao/data_PERGA.<p>Address of the bookmark: <a href="https://github.com/hitbio/PERGA" rel="nofollow">https://github.com/hitbio/PERGA</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36890/price-paired-read-iterative-contig-extension-a-de-novo-genome-assembler-implemented-in-c</guid>
	<pubDate>Mon, 11 Jun 2018 03:08:26 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36890/price-paired-read-iterative-contig-extension-a-de-novo-genome-assembler-implemented-in-c</link>
	<title><![CDATA[PRICE (Paired-Read Iterative Contig Extension), a de novo genome assembler implemented in C++.]]></title>
	<description><![CDATA[We are pleased to release PRICE (Paired-Read Iterative Contig Extension), a de novo genome assembler implemented in C++. Its name describes the strategy that it implements for genome assembly: PRICE uses paired-read information to iteratively increase the size of existing contigs. Initially, those contigs can be individual reads from a subset of the paired-read dataset, non-paired reads from sequencing technologies that provide non-paired data, or contigs that were output from a prior run of PRICE or any other assembler.

http://derisilab.ucsf.edu/software/price/<p>Address of the bookmark: <a href="http://derisilab.ucsf.edu/software/price/" rel="nofollow">http://derisilab.ucsf.edu/software/price/</a></p>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/39671/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</guid>
	<pubDate>Sat, 06 Jul 2019 03:48:22 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/39671/flye-fast-and-accurate-de-novo-assembler-for-single-molecule-sequencing-reads</link>
	<title><![CDATA[Flye: Fast and accurate de novo assembler for single molecule sequencing reads]]></title>
	<description><![CDATA[<p><span>Flye is a de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. It is designed for a wide range of datasets, from small bacterial projects to large mammalian-scale assemblies. The package represents a complete pipeline: it takes raw PB / ONT reads as input and outputs polished contigs. Flye also includes a special mode for metagenome assembly.</span></p><p>Address of the bookmark: <a href="https://github.com/fenderglass/Flye" rel="nofollow">https://github.com/fenderglass/Flye</a></p>]]></description>
	<dc:creator>Rahul Nayak</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/42477/hifiasm-a-haplotype-resolved-assembler-for-accurate-hifi-reads</guid>
	<pubDate>Thu, 24 Dec 2020 10:03:36 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/42477/hifiasm-a-haplotype-resolved-assembler-for-accurate-hifi-reads</link>
	<title><![CDATA[Hifiasm: a haplotype-resolved assembler for accurate Hifi reads]]></title>
	<description><![CDATA[<p><span>Hifiasm is a fast haplotype-resolved de novo assembler for PacBio Hifi reads. It can assemble a human genome in several hours and works with the California redwood genome, one of the most complex genomes sequenced so far. Hifiasm can produce primary/alternate assemblies of quality competitive with the best assemblers. It also introduces a new graph binning algorithm and achieves the best haplotype-resolved assembly given trio data.</span></p><p>Address of the bookmark: <a href="https://github.com/chhylp123/hifiasm" rel="nofollow">https://github.com/chhylp123/hifiasm</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/36456/alpaca-a-hybrid-strategy-for-assembly-of-genomic-dna-shotgun-sequencing-reads</guid>
	<pubDate>Mon, 30 Apr 2018 04:38:40 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/36456/alpaca-a-hybrid-strategy-for-assembly-of-genomic-dna-shotgun-sequencing-reads</link>
	<title><![CDATA[ALPACA: A hybrid strategy for assembly of genomic DNA shotgun sequencing reads.]]></title>
	<description><![CDATA[<p><span>ALPACA requires Celera Assembler 8.3 or later. It is recommended to build Celera Assembler from source. (Why? The pre-built binaries CA_8.3rc1 and CA8.3rc2 will work for any large data set.&nbsp;</span></p>
<p><span>Detail paper at&nbsp;https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-017-3927-8</span></p><p>Address of the bookmark: <a href="https://github.com/VicugnaPacos/ALPACA" rel="nofollow">https://github.com/VicugnaPacos/ALPACA</a></p>]]></description>
	<dc:creator>Seema Singh</dc:creator>
</item>

</channel>
</rss>