<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[BOL: Related items]]></title>
	<link>https://bioinformaticsonline.com/related/32875?offset=120</link>
	<atom:link href="https://bioinformaticsonline.com/related/32875?offset=120" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30144/bima-v3-an-aligner-customized-for-mate-pair-library-sequencing</guid>
	<pubDate>Wed, 14 Dec 2016 15:20:00 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30144/bima-v3-an-aligner-customized-for-mate-pair-library-sequencing</link>
	<title><![CDATA[BIMA V3: an aligner customized for mate pair library sequencing]]></title>
	<description><![CDATA[<p>Summary: Mate pair library sequencing is an effective and economical method for detecting genomic structural variants and chromosomal abnormalities. Unfortunately, the mapping and alignment of mate pair read pairs to a reference genome is a challenging and <br>time consuming process for most NGS alignment programs. Large insert sizes, introduction of library preparation protocol artifacts (biotin junction reads, paired-end read contamination, chimeras, etc.), and presence of structural variant breakpoints within reads increases mapping and alignment complexity. We describe an algorithm that is up to 20 times faster and 25% more accurate than popular NGS alignment programs when processing mate pair sequencing. <br>Availability: http://bioinformaticstools.mayo.edu/research/bima/ <br>Contact: vasmatzis.george@mayo.edu</p><p>Address of the bookmark: <a href="http://bioinformatics.oxfordjournals.org/content/early/2014/02/12/bioinformatics.btu078.full.pdf" rel="nofollow">http://bioinformatics.oxfordjournals.org/content/early/2014/02/12/bioinformatics.btu078.full.pdf</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30234/last</guid>
	<pubDate>Mon, 19 Dec 2016 14:07:53 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30234/last</link>
	<title><![CDATA[LAST]]></title>
	<description><![CDATA[<p>LAST can:</p>
<ul>
<li>Handle&nbsp;<strong>big</strong>&nbsp;sequence data, e.g:
<ul>
<li>Compare two vertebrate genomes</li>
<li>Align billions of DNA reads to a genome</li>
</ul>
</li>
<li>Indicate the&nbsp;<a href="http://lastweb.cbrc.jp/about.html">reliability</a>&nbsp;of each aligned column.</li>
<li>Use sequence quality data&nbsp;<a href="http://nar.oxfordjournals.org/content/38/7/e100.abstract">properly</a>.</li>
<li>Compare DNA to proteins, with frameshifts.</li>
<li>Compare PSSMs to sequences</li>
<li>Calculate the likelihood of chance similarities between random sequences.</li>
<li>Do split and spliced alignment.</li>
<li><a href="http://last.cbrc.jp/doc/last-train.html">Train</a>&nbsp;alignment parameters for unusual kinds of sequence (e.g. nanopore).</li>
</ul><p>Address of the bookmark: <a href="http://last.cbrc.jp/" rel="nofollow">http://last.cbrc.jp/</a></p>]]></description>
	<dc:creator>Bulbul</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/pages/view/30440/genome-assembly-tools-and-software-part2</guid>
	<pubDate>Tue, 27 Dec 2016 16:14:35 -0600</pubDate>
	<link>https://bioinformaticsonline.com/pages/view/30440/genome-assembly-tools-and-software-part2</link>
	<title><![CDATA[Genome Assembly Tools and Software - PART2 !!]]></title>
	<description><![CDATA[<p>The genome assemblers generally take a file of short sequence reads and a file of quality-value as the input. Since the quality-value file for the high throughput short reads is usually highly memory-intensive, only a few assemblers, best suited for your assembly. For the sake of computational memory saving and convenience of data inquiry, high-throughput short reads data is always initially formatted to specific data structure. Currently, existing data structure for this usage can be predominantly classified into two categories: string-based model and graph-based model.</p><p>We therefore list many genomle assembly tools here. We mainly reported for the assembly of genomes while the others are designed aiming at handling complex genomes.</p><ul>
<li><a href="http://smithlabresearch.org/software/rmap/" title="RMAP 2.1 &ndash; Short-read Mapping">RMAP 2.1 &ndash; Short-read Mapping<br /></a><a href="http://smithlabresearch.org/software/rmap/" target="_blank">RMAP</a>&nbsp;is aimed to map accurately reads from the next-generation sequencing technology. RMAP can map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated reads mapping. There is no limitaions on read widths or number of mismatches. RMAP can now map more than 8 million reads in an hour at full sensitivity to 2 mismatches<br /><br /></li>
<li><a href="https://sourceforge.net/p/mira-assembler/wiki/Home/" title="MIRA 4.0.2 &ndash; Whole Genome Shotgun and EST Sequence Assembler">MIRA 4.0.2 &ndash; Whole Genome Shotgun and EST Sequence Assembler<br /></a><a href="http://sourceforge.net/p/mira-assembler/wiki/Home/" target="_blank">MIRA</a>&nbsp;(Mimicking Intelligent Read Assembly)is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 12 years to get assembly jobs done efficiently &ndash; and especially accurately. That is, without actually putting too much manual work into finishing the assembly.<br /><br /></li>
<li><a href="http://www.brown.edu/Research/Istrail_Lab/hapcompass.php" title="HapCompass 0.7.7 &ndash; A Cycle-Basis Algorithm for Accurate Haplotype Assembly">HapCompass 0.7.7 &ndash; A Cycle-Basis Algorithm for Accurate Haplotype Assembly<br /></a><a href="http://www.brown.edu/Research/Istrail_Lab/hapcompass.php" target="_blank">HapCompass</a>&nbsp;for polyploid genomes can currently be used to create accurate pairwise SNP phasings.Given a set of aligned sequence reads in a SAM file and a set of variant calls in VCF format, HAPCOMPASS will assemble reads into haplotypes.<br /><br /></li>
<li><a href="http://www.csc.kth.se/~vezzi/software/" title="GAM-NGS 1.1b &ndash; Genome Assemblies Merger for Next Generation Sequencing">GAM-NGS 1.1b &ndash; Genome Assemblies Merger for Next Generation Sequencing<br /></a><a href="http://www.csc.kth.se/~vezzi/software/" target="_blank">GAM-NGS</a>&nbsp;is able to merge two or more assemblies and it rteturns an improved assembly (more contiguous and more correct). GAM-NGS shows its full potential with multi-library Illumina-based projects.<br /><br /></li>
<li><a href="http://omics.informatics.indiana.edu/GeneStitch/" title="GeneStitch 1.2.1 &ndash; Network Matching Algorithm to Gene Assembly">GeneStitch 1.2.1 &ndash; Network Matching Algorithm to Gene Assembly<br /></a><a href="http://omics.informatics.indiana.edu/GeneStitch/" target="_blank">GeneStitch</a>&nbsp;is a tool to assemble genes using network matching algorithm. Given an already-assembled dataset, it is capable of assembling contigs together to form more complete genes with the help of a reference gene set. Currently the assembly software that GeneStitch support is SOAPdenovo.<br /><br /></li>
<li><a href="http://bioen-compbio.bioen.illinois.edu/RACA/" title="RACA 0.9.1.1 &ndash; Reference-Assisted Chromosome Assembly">RACA 0.9.1.1 &ndash; Reference-Assisted Chromosome Assembly<br /></a><a href="http://bioen-compbio.bioen.illinois.edu/RACA/" target="_blank">RACA</a>&nbsp;is an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads.<br /><br /></li>
<li><a href="https://software.broadinstitute.org/software/discovar/blog/" title="DISCOVAR 51750 &ndash; Genome Shotgun Assembler and Variant Caller">DISCOVAR 51750 &ndash; Genome Shotgun Assembler and Variant Caller<br /></a><a href="http://www.broadinstitute.org/software/discovar/blog/" target="_blank">DISCOVAR</a>&nbsp;is a whole genome shotgun assembler and variant caller that can generate high quality assemblies and variant calls from the latest 250 base Illumina PCR-free fragment reads.<br /><br /></li>
<li><a href="http://www.seqan.de/projects/seqcons/" title="SeqCons 1.0 &ndash; de novo and reference-guided Sequence Assembly">SeqCons 1.0 &ndash; de novo and reference-guided Sequence Assembly<br /></a><a href="http://www.seqan.de/projects/seqcons/" target="_blank">&nbsp;SeqCons</a>&nbsp;(Sequence consensus) is an open source consensus computation program for Linux and Windows. The algorithm can be used for de novo and reference-guided sequence assembly.<br /><br /></li>
<li><a href="http://www.personal.psu.edu/jhm10/Vera/SoftwareC.html" title="SimAssemblyStage1/2 0.2 &ndash; Assembly Alignment of Contigs">SimAssemblyStage1/2 0.2 &ndash; Assembly Alignment of Contigs<br /></a><a href="http://www.personal.psu.edu/jhm10/Vera/SoftwareC.html" target="_blank">SimAssemblyStage1</a>: Perfectly aligns TranscriptSimulator reads to their nucleotide templates using read title inforamation, creating ideal simulated assembly of super contigs.<br /><br /></li>
<li><a href="http://www.csc.kth.se/~vezzi/software/" title="GapFiller &ndash; Closing the Gap within Paired Reads">GapFiller &ndash; Closing the Gap within Paired Reads<br /></a><a href="http://www.csc.kth.se/~vezzi/software/" target="_blank">GapFiller</a>&nbsp;is not a standard de novo assembler. It aims &ldquo;only&rdquo; at closing the gap between pairs of reads as a first step of a large number of downstream analysis<br /><br /></li>
<li><a href="http://www.sanger.ac.uk/science/tools/pagit" title="PAGIT 1.01 &ndash; Post Assembly Genome Improvement Toolkit">PAGIT 1.01 &ndash; Post Assembly Genome Improvement Toolkit<br /></a><a href="http://www.sanger.ac.uk/resources/software/pagit/" target="_blank">PAGIT</a>&nbsp;(Post Assembly Genome Improvement Toolkit) is a tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.<br /><br /></li>
<li><a href="https://www.bsse.ethz.ch/cbg/software.html" title="ShoRAH 0.8.2 &ndash; Short Reads Assembly into Haplotypes">ShoRAH 0.8.2 &ndash; Short Reads Assembly into Haplotypes<br /></a><a href="http://www.bsse.ethz.ch/cbg/software/shorah" target="_blank">ShoRAH</a>&nbsp;is a software package that allows for inference about the structure of a population from a set of short sequence reads as obtained from ultra-deep sequencing of a mixed sample. The package contains programs that support mapping of reads to a reference genome, correcting sequencing errors by locally clustering reads in small windows of the alignment, reconstructing a minimal set of global haplotypes that explain the reads, and estimating the frequencies of the inferred haplotypes.<br /><br /></li>
<li><a href="http://www.genomics.cn/en/navigation/show_navigation?nid=2732" title="RePS 2.0 &ndash; WGS Sequence Assembler">RePS 2.0 &ndash; WGS Sequence Assembler<br /></a><a href="http://www.genomics.cn/en/navigation/show_navigation?nid=2732" target="_blank">RePS</a>&nbsp;(Repeat-masked Phrap with scaffolding), a WGS sequence assembler, that explicitly identifies exact kmer repeats from the shotgun data and removes them prior to the assembly. The established software Phrap is used to compute meaningful error probabilities for each base. Clone-end-pairing information is used to construct scaffolds that order and orient the contigs. The updated version of RePS incorporates some of the ideas introduced by Phusion on clustering<br /><br /></li>
<li><a href="http://bibiserv2.cebitec.uni-bielefeld.de/sessionTimeout.jsf" title="treecat &ndash; Phylogenetic Comparative Assembly">treecat &ndash; Phylogenetic Comparative Assembly<br /></a><a href="http://bibiserv2.cebitec.uni-bielefeld.de/cgcat?id=cgcat_treecat" target="_blank">treecat</a>&nbsp;(phylogenetic tree based contig arrangement tool) takes several genomes and their relationships in a phylogenetic tree into account to estimate a possible ordering of the contigs.<br /><br /></li>
<li><a href="http://alumni.cs.ucr.edu/~liw/isolasso.html" title="IsoLasso 2.6.1 &ndash; A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly">IsoLasso 2.6.1 &ndash; A LASSO Regression Approach to RNA-Seq Based Transcriptome Assembly<br /></a><a href="http://alumni.cs.ucr.edu/~liw/isolasso.html" target="_blank">IsoLasso</a>&nbsp;is an algorithm to assemble transcripts and estimate their expression levels from RNA-Seq reads.<br /><br /></li>
<li><a href="http://alumni.cs.ucr.edu/~liw/cem.html" title="CEM 0.9.1 &ndash; Transcriptome Assembly and Isoform Expression Level Estimation from Biased RNA-Seq Reads">CEM 0.9.1 &ndash; Transcriptome Assembly and Isoform Expression Level Estimation from Biased RNA-Seq Reads<br /></a><a href="http://alumni.cs.ucr.edu/~liw/cem.html" target="_blank">CEM</a>&nbsp;is an algorithm to assemble transcripts and estimate their expression levels from RNA-Seq reads.<br /><br /></li>
<li><a href="http://alan.cs.gsu.edu/NGS/?q=malta" title="MaLTA &ndash; Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq data">MaLTA &ndash; Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq data<br /></a><a href="http://alan.cs.gsu.edu/NGS/?q=malta" target="_blank">MaLTA</a>&nbsp;is a method for simultaneous transcriptome assembly and quantification from Ion Torrent RNA-Seq data.<br /><br /></li>
<li><a href="http://amos.sourceforge.net/wiki/index.php/AMOS" title="AMOS 3.1.0 &ndash; Whole Genome Shotgun Assembler">AMOS 3.1.0 &ndash; Whole Genome Shotgun Assembler<br /></a><a href="http://amos.sourceforge.net/wiki/index.php/AMOS" target="_blank">AMOS</a>&nbsp;(<strong>A</strong><strong>M</strong>odular,&nbsp;<strong>O</strong>pen-<strong>S</strong>ource)&nbsp;consortium is committed to the development of open-source whole genome assembly software. The project acronym (AMOS) represents our primary goal &mdash; to produce A Modular, Open-Source whole genome assembler.Open-source so that everyone is welcome to contribute and help build outstanding assembly tools, and modular in nature so that new contributions can be easily inserted into an existing assembly pipeline. This modular design will foster the development of new assembly algorithms and allow the AMOS project to continually grow and improve in hopes of eventually becoming a widely accepted and deployed assembly infrastructure. In this sense, AMOS is both a design philosophy and a software system.<br /><br /></li>
<li><a href="http://amos.sourceforge.net/wiki/index.php/AutoEditor" title="AutoEditor 1.20 &ndash; Automated Correction of Genome Sequence Errors">AutoEditor 1.20 &ndash; Automated Correction of Genome Sequence Errors<br /></a><a href="http://amos.sourceforge.net/wiki/index.php/AutoEditor" target="_blank">AutoEditor</a>&nbsp;is a tool for correcting sequencing and basecaller errors using sequence assembly and chromatogram data. On average AutoEditor corrects 80% of erroneous base calls, with an accuracy of 99.99%.This in turn improves the overall accuracy of genome sequences and facilitates the use of these sequences for polymorphism discovery.<br /><br /></li>
<li><a href="http://www.csd.uwo.ca/~ilie/SAGE/" title="SAGE &ndash; String Graph Assembly of GEnomes">SAGE &ndash; String Graph Assembly of GEnomes<br /></a><a href="http://www.csd.uwo.ca/~ilie/SAGE/" target="_blank">SAGE</a>&nbsp;is a new string-overlap graph-based de novo genome assembler.<br /><br /></li>
<li><a href="http://omega.omicsbio.org/" title="Omega 1.0.2 &ndash; Overlap-graph de novo Assembler for Metagenomics">Omega 1.0.2 &ndash; Overlap-graph de novo Assembler for Metagenomics<br /></a><a href="http://omega.omicsbio.org/" target="_blank">Omega</a>&nbsp;is a software for assembling and scaffolding Illumina sequencing data of microbial communities.<br /><br /></li>
<li><a href="http://www.compgenome.org/TCGA-Assembler/" title="TCGA-Assembler 1.0.3 &ndash; Open-Source Software for Retrieving and Processing TCGA Data">TCGA-Assembler 1.0.3 &ndash; Open-Source Software for Retrieving and Processing TCGA Data<br /></a><a href="http://www.compgenome.org/TCGA-Assembler/" target="_blank">TCGA-Assembler</a>&nbsp;is an open-source, freely available tool that automatically downloads, assembles, and processes public The Cancer Genome Atlas (TCGA) data, to facilitate downstream data analysis by relieving investigators from the burdens of data preparation.<br /><br /></li>
<li><a href="http://sammate.sourceforge.net/" title="SAMMate 2.7.4 / assemblySAM 1.1 &ndash;  Processing Short Read Alignments in SAM/BAM format / RNA-Seq Assembly and Analysis">SAMMate 2.7.4 / assemblySAM 1.1 &ndash; Processing Short Read Alignments in SAM/BAM format / RNA-Seq Assembly and Analysis<br /></a>
<p><a href="http://sammate.sourceforge.net/" target="_blank">SAMMate</a>&nbsp;is an open source GUI software suite to process RNA-Seq data. It is composed of two modules: assemblySAM and SAMMate.</p>
<p>assemblySAM employs a novel method to localize and assemble RNA-seq reads into RNA transcript sequences.<br /><br /></p>
</li>
<li><a href="http://www.cs.tau.ac.il/~bchor/StringGraph/" title="StringGraph beta &ndash; String Graph Construction Using Incremental Hashing">StringGraph beta &ndash; String Graph Construction Using Incremental Hashing<br /></a><a href="http://www.cs.tau.ac.il/~bchor/StringGraph/" target="_blank">StringGraph</a>&nbsp;is a novel, hash based method for constructing the string graph.<br /><br /></li>
<li><a href="http://mindthegap.genouest.org/" title="MindTheGap 1.0.0 &ndash; Detection and Assembly of Insertion Variants">MindTheGap 1.0.0 &ndash; Detection and Assembly of Insertion Variants<br /></a><a href="http://mindthegap.genouest.org/" target="_blank">MindTheGap</a>&nbsp;is a software that performs detection and assembly of DNA insertion variants in NGS read datasets with respect to a reference genome.<br /><br /></li>
<li><a href="http://cbcb.umd.edu/software/metAMOS" title="MetAMOS 1.5rc3 &ndash; Metagenomic Assembly pipeline for AMOS">MetAMOS 1.5rc3 &ndash; Metagenomic Assembly pipeline for AMOS<br /></a><a href="http://cbcb.umd.edu/software/metAMOS" target="_blank">MetAMOS</a>&nbsp;is an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations.<br /><br /></li>
<li><a href="http://impact.crhc.illinois.edu/projects.aspx#tiger" title="TIGER &ndash; DNA Sequence Assembly">TIGER &ndash; DNA Sequence Assembly<br /></a><a href="http://impact.crhc.illinois.edu/projects.aspx#tiger" target="_blank">Tiger</a>&nbsp;is a novel de novo assembly framework &nbsp;which adapts to available computing resources by iteratively decomposing the assembly problem into sub-problems.<br /><br /></li>
<li><a href="https://github.com/baoe/AlignGraph" title="AlignGraph &ndash; Secondary de novo Genome Assembly guided by closely related References">AlignGraph &ndash; Secondary de novo Genome Assembly guided by closely related References<br /></a><a href="https://github.com/baoe/AlignGraph" target="_blank">AlignGraph</a>&nbsp;is a software that extends and joins contigs or scaffolds by reassembling them with help provided by a reference genome of a closely related organism.<br /><br /></li>
<li><a href="http://compbio.cs.toronto.edu/hapsembler/scarpa.html" title="scarpa 0.241 &ndash; Scaffolding Reads with Practical Algorithms">scarpa 0.241 &ndash; Scaffolding Reads with Practical Algorithms<br /></a><a href="http://compbio.cs.toronto.edu/hapsembler/scarpa.html" target="_blank">Scarpa</a>&nbsp;is a stand-alone scaffolding tool for NGS data. It can be used together with virtually any genome assembler and any NGS read mapper that supports SAM format. Other features include support for multiple libraries and an option to estimate insert size distributions from data.<br /><br /></li>
<li><a href="http://genetics.cs.ucla.edu/vga/" title="VGA v1 &ndash; Viral Genome Assembler">VGA v1 &ndash; Viral Genome Assembler<br /></a><a href="http://genetics.cs.ucla.edu/vga/" target="_blank">VGA</a>&nbsp;is a method for accurate assembly of a heterogeneous viral population consisting of individuals viral genomes (also known as quasi-species).<br /><br /></li>
<li><a href="https://cbcl.ics.uci.edu//doku.php/software#genomix" title="Genomix 0.2.11 &ndash; Parallel Genome Assembly using Hyracks">Genomix 0.2.11 &ndash; Parallel Genome Assembly using Hyracks<br /></a><a href="https://cbcl.ics.uci.edu//doku.php/software#genomix" target="_blank">Genomix</a>&nbsp;is a parallel genome assembly system built from the ground up with scalability in mind. It can assemble large and high-coverage genomes from fastq files in a short time and produces assemblies similar to Velvet or Ray in quality.<br /><br /></li>
<li><a href="http://shendurelab.github.io/LACHESIS/" title="LACHESIS &ndash; Genome Assembly with Contact Probability Maps">LACHESIS &ndash; Genome Assembly with Contact Probability Maps<br /></a><a href="http://shendurelab.github.io/LACHESIS/" target="_blank">LACHESIS</a>&nbsp;is method that exploits contact probability map data (e.g. from Hi-C) for chromosome-scale de novo genome assembly.<br /><br /></li>
<li><a href="http://www.cmbb.arizona.edu/?page_id=312" title="KGBassembler 1.2 &ndash; Karyotype-based Genome Assembler for Brassicaceae Species">KGBassembler 1.2 &ndash; Karyotype-based Genome Assembler for Brassicaceae Species<br /></a><a href="http://www.cmbb.arizona.edu/?page_id=312" target="_blank">KGBassembler</a>&nbsp;(Brassicaceae genome assembler) is a C++ based tool for assembling contigs and/or scaffolds to full chromosomes based on the karyotype maps of Brassicaceae species and without the need of genetic and physical maps.<br /><br /></li>
<li><a href="https://sourceforge.net/projects/autoassemblyd/" title="AutoAssemblyD 0.1 &ndash; Graphical User Interface system for several Genome Assembler">AutoAssemblyD 0.1 &ndash; Graphical User Interface system for several Genome Assembler<br />The&nbsp;</a><a href="http://sourceforge.net/projects/autoassemblyd/" target="_blank">AssemblyD</a>&nbsp;is a software which performed the local and remote genome assembly by several assemblers based on an XML Template which can replace the large command lines required by most assemblers.<a href="http://www.mybiosoftware.com/autoassemblyd-0-1-graphical-user-interface-system-for-several-genome-assembler.html" title="AutoAssemblyD 0.1 &ndash; Graphical User Interface system for several Genome Assembler"><br /><br /></a></li>
<li><a href="http://bio.cs.put.poznan.pl/programs/519227629dfb89a7fa000001" title="SR-ASM &ndash; DNA Assembly of the Short Sequences coming from 454 sequencer">SR-ASM &ndash; DNA Assembly of the Short Sequences coming from 454 sequencer<br /></a><a href="http://bio.cs.put.poznan.pl/programs/519227629dfb89a7fa000001" target="_blank">SR-ASM</a>&nbsp;(Short Reads ASseMbly) algorithm is designed for DNA assembly of the short sequences coming from 454 sequencers.<a href="http://www.mybiosoftware.com/sr-asm-dna-assembly-short-sequences-coming-454-sequencer.html" title="SR-ASM &ndash; DNA Assembly of the Short Sequences coming from 454 sequencer"><br /><br /></a></li>
<li><a href="http://www.bx.psu.edu/miller_lab/" title="YASRA 2.33 &ndash; Yet Another Short Read Assembler">YASRA 2.33 &ndash; Yet Another Short Read Assembler<br /></a><a href="http://www.bx.psu.edu/miller_lab/" target="_blank">YASRA</a>&nbsp;performs comparative assembly of short reads using a reference genome, which can differ substantially from the genome being sequenced.<a href="http://www.mybiosoftware.com/yasra-2-32-short-read-assembler.html" title="YASRA 2.33 &ndash; Yet Another Short Read Assembler"><br /><br /></a></li>
<li><a href="http://derisilab.ucsf.edu/software/price/index.html" title="PRICE 1.2 &ndash; de novo Genome Assembler">PRICE 1.2 &ndash; de novo Genome Assembler<br /></a><a href="http://derisilab.ucsf.edu/software/price/index.html" target="_blank">PRICE</a>&nbsp;(Paired-Read Iterative Contig Extension) is a de novo genome assembler implemented in C++. Its name describes the strategy that it implements for genome assembly: PRICE uses paired-read information to iteratively increase the size of existing contigs. Initially, those contigs can be individual reads from a subset of the paired-read dataset, non-paired reads from sequencing technologies that provide non-paired data, or contigs that were output from a prior run of PRICE or any other&nbsp;<a href="http://www.mybiosoftware.com/price-0-18-de-novo-genome-assembler.html" title="PRICE 1.2 &ndash; de novo Genome Assembler"><br /><br /></a></li>
<li><a href="https://sc932.github.com/ALE/" title="ALE 20130717 &ndash; Assembly Likelihood Estimator">ALE 20130717 &ndash; Assembly Likelihood Estimator<br /></a><a href="http://sc932.github.com/ALE/" target="_blank">ALE</a>&nbsp;is a probabalistic framework for determining the likelihood of an assembly given the data (raw reads) used to assemble it. It allows for the rapid discovery of errors and comparisons between similar assemblies.<a href="http://www.mybiosoftware.com/ale-assembly-likelihood-estimator.html" title="ALE 20130717 &ndash; Assembly Likelihood Estimator"><br /><br /></a></li>
<li><a href="https://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE" title="SSPACE 3.0 &ndash; Scaffolding pre-assembled Contigs using Paired-read data">SSPACE 3.0 &ndash; Scaffolding pre-assembled Contigs using Paired-read data<br /></a><a href="http://www.baseclear.com/lab-products/bioinformatics-tools/sspace-standard/" target="_blank">SSPACE</a>&nbsp;(SSAKE-based Scaffolding of Pre-Assembled Contigs after Extension) is a stand-alone program for scaffolding pre-assembled contigs using paired-read data. It is unique in offering the possibility to manually control the scaffolding process. By using the distance information of paired-end and/or matepair data, SSPACE is able to assess the order, distance and orientation of your contigs and combine them into scaffolds. Currently we offer this as a command-line tool in Perl. The input data is given by pre-assembled contig sequences (FASTA) and NGS paired-read data (FASTA or FASTQ). The final scaffolds are provided in FASTA format.<a href="http://www.mybiosoftware.com/sspace-1-2-scaffolding-pre-assembled-contigs-paired-read-data.html" title="SSPACE 3.0 &ndash; Scaffolding pre-assembled Contigs using Paired-read data"><br /><br /></a></li>
<li><a href="http://www.sanger.ac.uk/science/tools/image" title="IMAGE 2.4.1 &ndash; Iterative Mapping and Assembly for Gap Elimination">IMAGE 2.4.1 &ndash; Iterative Mapping and Assembly for Gap Elimination<br /></a><a href="http://www.sanger.ac.uk/resources/software/pagit/#IMAGE" target="_blank">IMAGE</a>&nbsp;( Iterative Mapping and Assembly for Gap Elimination) is a software designed to close gaps in any draft assembly using Illumina paired end reads. IMAGE is best described in several stages: aligning of Illumina reads at contig ends; local assembly of reads into new contigs; reference contigs are extended or merged; iterating the whole process to extend and merge more contigs.<a href="http://www.mybiosoftware.com/image-2-3-iterative-mapping-assembly-gap-elimination.html" title="IMAGE 2.4.1 &ndash; Iterative Mapping and Assembly for Gap Elimination"><br /><br /></a></li>
<li><a href="https://www.hgsc.bcm.edu/software/atlas-gapfill" title="ATLAS GapFill 2.2 &ndash; Deals with the Repetitive Gap Assembly problem">ATLAS GapFill 2.2 &ndash; Deals with the Repetitive Gap Assembly problem<br /></a><a href="https://www.hgsc.bcm.edu/software/atlas-gapfill" target="_blank">ATLAS GapFill</a>&nbsp;deals with the repetitive gap assembly problem by using the unique gap-flanking sequences to group reads and convert the problem to a local assembly task. Localizing the assembly reduces the numbers of repeats in the assembly, allows more data to be incorporated, and allows for gaps to be filled.<a href="http://www.mybiosoftware.com/atlas-gapfill-2-2-deals-repetitive-gap-assembly-problem.html" title="ATLAS GapFill 2.2 &ndash; Deals with the Repetitive Gap Assembly problem"><br /><br /></a></li>
<li><a href="https://www.hgsc.bcm.edu/software/atlas-whole-genome-assembly-suite" title="Atlas 2005 &ndash; Whole Genome Assembly Suite">Atlas 2005 &ndash; Whole Genome Assembly Suite<br /></a><a href="https://www.hgsc.bcm.edu/software/atlas-whole-genome-assembly-suite" target="_blank">Atlas</a>&nbsp;is a collection of software tools to facilitate the assembly of large genomes from whole genome shotgun reads, or a combination of whole genome shotgun reads and BAC or other localized reads.<a href="http://www.mybiosoftware.com/atlas-2005-genome-assembly-suite.html" title="Atlas 2005 &ndash; Whole Genome Assembly Suite"><br /><br /></a></li>
<li><a href="http://bio.math.berkeley.edu/cgal/" title="CGAL 0.9.6b &ndash; Computing Genome Assembly Likelihoods">CGAL 0.9.6b &ndash; Computing Genome Assembly Likelihoods<br /></a><a href="http://bio.math.berkeley.edu/cgal/" target="_blank">CGAL</a>&nbsp;is a tool for computing genome assembly likelihoods. It computes the likelihood of reads with respect to the assembly and a statistical model which can be used as a metric for evaluating assemblies.<a href="http://www.mybiosoftware.com/cgal-0-9-6-computing-genome-assembly-likelihoods.html" title="CGAL 0.9.6b &ndash; Computing Genome Assembly Likelihoods"><br /><br /></a></li>
<li><a href="https://github.com/lh3/fermi" title="Fermi 1.1 &ndash; WGS de novo Assembler based on the FMD-index for large Genomes">Fermi 1.1 &ndash; WGS de novo Assembler based on the FMD-index for large Genomes<br /></a><a href="https://github.com/lh3/fermi" target="_blank">Fermi</a>&nbsp;is a de novo assembler for Illumina reads from whole-genome short-gun sequencing. It also provides tools for error correction, sequence-to-read alignment and comparison between read sets. It uses the FMD-index, a novel compressed data structure, as the key data&nbsp;<a href="http://www.mybiosoftware.com/fermi-1-1-wgs-de-novo-assembler-based-on-the-fmd-index-for-large-genomes.html" title="Fermi 1.1 &ndash; WGS de novo Assembler based on the FMD-index for large Genomes"><br /><br /></a></li>
<li><a href="http://pasha.sourceforge.net/homepage.htm#latest" title="PASHA 1.0.10 &ndash; Parallelized Short Read Assembly">PASHA 1.0.10 &ndash; Parallelized Short Read Assembly<br /></a><a href="http://pasha.sourceforge.net/" target="_blank">PASHA</a>&nbsp;is a parallel short read assembler for large genomes using de Bruijn graphs. Taking advantage of both shared-memory multi-core CPUs and distributed-memory compute clusters, PASHA has demonstrated its potential to perform high-quality de-novo assembly of large genomes in reasonable time with modest computing resources. Our evaluation using three small real paired-end datasets shows that PASHA is able to produce better assemblies with comparable genome coverage and mis-assembly rates compared to three leading assemblers: Velvet, ABySS and SOAPdenovo. Moreover, PASHA achieves the fastest speed for all three datasets on a single CPU.<a href="http://www.mybiosoftware.com/pasha-1-0-5-parallelized-short-read-assembly.html" title="PASHA 1.0.10 &ndash; Parallelized Short Read Assembly"><br /><br /></a></li>
<li><a href="http://xgenovo.dna.bio.keio.ac.jp/" title="XGenovo &ndash; Extended Genovo Metagenomic Assembler by Incorporating Paired-End Information">XGenovo &ndash; Extended Genovo Metagenomic Assembler by Incorporating Paired-End Information<br /></a><a href="http://xgenovo.dna.bio.keio.ac.jp/" target="_blank">XGenovo</a>&nbsp;(Extended Genovo) is an extended genovo metagenomic assembler by incorporating paired-end information<a href="http://www.mybiosoftware.com/xgenovo-extended-genovo-metagenomic-assembler-by-incorporating-paired-end-information.html" title="XGenovo &ndash; Extended Genovo Metagenomic Assembler by Incorporating Paired-End Information"><br /><br /></a></li>
<li><a href="http://metavelvet.dna.bio.keio.ac.jp/" title="MetaVelvet 1.2.01 / MetaVelvet-SL &ndash; An Extension of Velvet Assembler to de novo Metagenomic Assembly / utilizing Supervised Learning">MetaVelvet 1.2.01 / MetaVelvet-SL &ndash; An Extension of Velvet Assembler to de novo Metagenomic Assembly / utilizing Supervised Learning<br /></a><a href="http://metavelvet.dna.bio.keio.ac.jp/" target="_blank">MetaVelvet</a>&nbsp;is an extension of Velvet assembler to de novo metagenome assembly from short sequence reads<a href="http://www.mybiosoftware.com/metavelvet-1-2-01-metavelvet-sl-an-extension-of-velvet-assembler-to-de-novo-metagenomic-assembly-utilizing-supervised-learning.html" title="MetaVelvet 1.2.01 / MetaVelvet-SL &ndash; An Extension of Velvet Assembler to de novo Metagenomic Assembly / utilizing Supervised Learning"><br /><br /></a></li>
<li><a href="http://www.genomic.ch/edena.php" title="Edena v3.131028 &ndash; De Novo Short Reads Assembler">Edena v3.131028 &ndash; De Novo Short Reads Assembler<br /></a><a href="http://www.genomic.ch/edena.php" target="_blank">Edena</a>&nbsp;is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer<a href="http://www.mybiosoftware.com/edena-v3-dev110920-de-novo-short-reads-assembler.html" title="Edena v3.131028 &ndash; De Novo Short Reads Assembler"><br /><br /></a></li>
<li><a href="https://github.com/gramarga/ConPADE" title="ConPADE 1.00 &ndash; Contig Ploidy and Allele Dosage Estimation">ConPADE 1.00 &ndash; Contig Ploidy and Allele Dosage Estimation<br /></a><a href="http://research.microsoft.com/en-us/downloads/62815951-4b89-47a5-9e3d-7054182dafbb/default.aspx" target="_blank">ConPADE</a>&nbsp;is a tool used to estimate contig ploidy and allele dosage in polyploid genome assemblies.<a href="http://www.mybiosoftware.com/conpade-1-00-contig-ploidy-and-allele-dosage-estimation.html" title="ConPADE 1.00 &ndash; Contig Ploidy and Allele Dosage Estimation"><br /><br /></a></li>
<li><a href="https://sourceforge.net/projects/eloper/" title="ELOPER 1.2 &ndash; Elongation of Paired-end Reads for de novo Assembly">ELOPER 1.2 &ndash; Elongation of Paired-end Reads for de novo Assembly<br /></a><a href="http://sourceforge.net/projects/eloper/" target="_blank">ELOPER</a>&nbsp;is a pre-processing tool for pair-end sequences that produces a better read library for assembly programs.<a href="http://www.mybiosoftware.com/eloper-1-2-elongation-of-paired-end-reads-for-de-novo-assembly.html" title="ELOPER 1.2 &ndash; Elongation of Paired-end Reads for de novo Assembly"><br /><br /></a></li>
<li><a href="http://www.ebi.ac.uk/~zerbino/oases/" title="Oases 0.2.08 &ndash; De novo Transcriptome Assembler for very short reads">Oases 0.2.08 &ndash; De novo Transcriptome Assembler for very short reads<br /></a><a href="http://www.ebi.ac.uk/~zerbino/oases/" target="_blank">Oases</a>&nbsp;designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events, and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo&nbsp;<a href="http://www.mybiosoftware.com/oases-0-2-06-de-novo-transcriptome-assembler-short-reads.html" title="Oases 0.2.08 &ndash; De novo Transcriptome Assembler for very short reads"><br /><br /></a></li>
<li><a href="http://www.physics.rutgers.edu/~anirvans/SOPRA/" title="SOPRA 1.4.6 &ndash; Statistical Optimization of Paired Read Assembly">SOPRA 1.4.6 &ndash; Statistical Optimization of Paired Read Assembly<br /></a><a href="http://www.physics.rutgers.edu/~anirvans/SOPRA/" target="_blank">SOPRA</a>&nbsp;is an assembler for mate pair/paired-end reads from high throughput sequencing platforms, e.g. Illumina and SOLiD.<a href="http://www.mybiosoftware.com/sopra-1-4-6-statistical-optimization-paired-read-assembly.html" title="SOPRA 1.4.6 &ndash; Statistical Optimization of Paired Read Assembly"><br /><br /></a></li>
<li><a href="http://rnc.r.dendai.ac.jp/hapAssembly.html" title="hapAssembly &ndash; Haplotype Assembly from Whole-Genome Sequence Data">hapAssembly &ndash; Haplotype Assembly from Whole-Genome Sequence Data<br /></a><a href="http://rnc.r.dendai.ac.jp/hapAssembly.html" target="_blank">hapAssembly</a>&nbsp;&nbsp;beats the previous best for the important Haplotype Assembly Problem. It is&nbsp;an approach to finding optimal solutions for the haplotype assembly problem under the minimum-error-correction (MEC) model.<a href="http://www.mybiosoftware.com/hapassembly-haplotype-assembly-whole-genome-sequence-data.html" title="hapAssembly &ndash; Haplotype Assembly from Whole-Genome Sequence Data"><br /><br /></a></li>
<li><a href="https://code.google.com/archive/p/pbsim/" title="PBSIM 1.0.3 &ndash; PacBio Reads Simulator">PBSIM 1.0.3 &ndash; PacBio Reads Simulator<br /></a>PacBio sequencers produced two types of characteristic reads: CCS (short and low error rate) and CLR (long and high error rate), both of which could be useful for de novo assembly of genomes.&nbsp;<a href="https://code.google.com/p/pbsim/" target="_blank">PBSIM</a>&nbsp;simulates those PacBio reads by using either a model-based or sampling-based simulation.<a href="http://www.mybiosoftware.com/pbsim-1-0-3-pacbio-reads-simulator.html" title="PBSIM 1.0.3 &ndash; PacBio Reads Simulator"><br /><br /></a></li>
<li><a href="http://marte.ic.unicamp.br:8747/" title="SIS &ndash; Generate Draft Genome Sequence Scaffolds for Prokaryotes">SIS &ndash; Generate Draft Genome Sequence Scaffolds for Prokaryotes<br /></a><a href="http://marte.ic.unicamp.br:8747/" target="_blank">SIS</a>&nbsp;(Scaffolds from Inversion Signatures)is a new easy-to-use tool to generate contig scaffolds<a href="http://www.mybiosoftware.com/sis-generate-draft-genome-sequence-scaffolds-prokaryotes.html" title="SIS &ndash; Generate Draft Genome Sequence Scaffolds for Prokaryotes"><br /><br /></a></li>
<li><a href="https://www.cs.helsinki.fi/group/scaffold/normalizedN50/" title="NN50-calculator 0.5 &ndash; Evaluate the Correctness of Genome Assemblies">NN50-calculator 0.5 &ndash; Evaluate the Correctness of Genome Assemblies<br /></a><a href="http://www.cs.helsinki.fi/group/scaffold/normalizedN50/" target="_blank">NN50-calculator</a>&nbsp;(Normalized N50 calculator) is a tool for evaluating the correctness of genome assemblies.<a href="http://www.mybiosoftware.com/nn50-calculator-0-5-evaluate-correctness-genome-assemblies.html" title="NN50-calculator 0.5 &ndash; Evaluate the Correctness of Genome Assemblies"><br /><br /></a></li>
<li><a href="http://josephryan.github.io/baa.pl/" title="Baa.pl 0.20 &ndash; use BLAT to ASSESS an ASSEMBLY">Baa.pl 0.20 &ndash; use BLAT to ASSESS an ASSEMBLY<br /></a><a href="http://josephryan.github.io/baa.pl/" target="_blank">Baa.pl</a>&nbsp;is a simple script that parses the output of a BLAT run of a transcriptome vs. a genome assembly.<a href="http://www.mybiosoftware.com/baa-pl-0-10-blat-assess-assembly.html" title="Baa.pl 0.20 &ndash; use BLAT to ASSESS an ASSEMBLY"><br /><br /></a></li>
<li><a href="http://compbio.cs.toronto.edu/hapsembler/index.html" title="hapsembler 2.21 &ndash; Haplotype-specific Genome Assembly Toolkit">hapsembler 2.21 &ndash; Haplotype-specific Genome Assembly Toolkit<br /></a><a href="http://compbio.cs.toronto.edu/hapsembler/index.html" target="_blank">Hapsembler</a>&nbsp;is a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism. Hapsembler can be used to assemble reads from a variety of platforms including Illumina and Roche/454.<a href="http://www.mybiosoftware.com/hapsembler-2-1-haplotype-specific-genome-assembly-toolkit.html" title="hapsembler 2.21 &ndash; Haplotype-specific Genome Assembly Toolkit"><br /><br /></a></li>
<li><a href="http://alan.cs.gsu.edu/NGS/?q=content/vispa" title="ViSpA 02 &ndash; Viral Spectrum Assembler">ViSpA 02 &ndash; Viral Spectrum Assembler<br /></a><a href="http://alan.cs.gsu.edu/NGS/?q=content/vispa" target="_blank">ViSpA</a>&nbsp;(Viral Spectrum Assembling) implements a novel viral assembling and frequency estimation methods. This software uses a simple error correction, viral variants assembling based on maximum-bandwidth paths in weighted read graphs and frequency estimation via Expectation Maximization on all reads.<a href="http://www.mybiosoftware.com/vispa-01-viral-spectrum-assembler.html" title="ViSpA 02 &ndash; Viral Spectrum Assembler"><br /><br /></a></li>
<li><a href="http://www.vicbioinformatics.com/software.velvetoptimiser.shtml" title="VelvetOptimiser 2.2.5 &ndash; Automatically Optimise Velvet Assembler Parameters">VelvetOptimiser 2.2.5 &ndash; Automatically Optimise Velvet Assembler Parameters<br /></a><a href="http://www.vicbioinformatics.com/software.velvetoptimiser.shtml" target="_blank">VelvetOptimiser</a>&nbsp;is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet de novo sequence assembler.<a href="http://www.mybiosoftware.com/velvetoptimiser-2-2-5-automatically-optimise-velvet-assembler-parameters.html" title="VelvetOptimiser 2.2.5 &ndash; Automatically Optimise Velvet Assembler Parameters"><br /><br /></a></li>
<li><a href="http://www.vicbioinformatics.com/software.assemblet.shtml" title="Assemblet 0.1 &ndash; Antigenic Variation Assembler">Assemblet 0.1 &ndash; Antigenic Variation Assembler<br /></a><a href="http://www.vicbioinformatics.com/software.assemblet.shtml" target="_blank">Assemblet</a>&nbsp;is a short read assembler for assembling antigenic variant sequences in bacteria.<a href="http://www.mybiosoftware.com/assemblet-0-1-antigenic-variation-assembler.html" title="Assemblet 0.1 &ndash; Antigenic Variation Assembler"><br /><br /></a></li>
<li><a href="http://www.vicbioinformatics.com/software.velvetk.shtml" title="VelvetK 20120606 &ndash; Find a reasonable K-mer size to Assemble Genome Reads with Velvet">VelvetK 20120606 &ndash; Find a reasonable K-mer size to Assemble Genome Reads with Velvet<br /></a><a href="http://www.vicbioinformatics.com/software.velvetk.shtml" target="_blank">VelvetK</a>&nbsp;can estimate the best k-mer size to use for your Velvet de novo assembly. It needs two inputs: the estimated genome size, and all your sequence read files. The genome size can be supplied as as a number (eg. 3.5M) or as a FASTA file of a closely related genome.<a href="http://www.mybiosoftware.com/velvetk-20120606-find-reasonable-k-mer-size-assemble-genome-reads-velvet.html" title="VelvetK 20120606 &ndash; Find a reasonable K-mer size to Assemble Genome Reads with Velvet"><br /><br /></a></li>
<li><a href="http://www.vicbioinformatics.com/software.vague.shtml" title="VAGUE 1.0.5 &ndash; Velvet Assembler Graphical User Environment">VAGUE 1.0.5 &ndash; Velvet Assembler Graphical User Environment<br /></a><a href="http://www.vicbioinformatics.com/software.vague.shtml" target="_blank">VAGUE</a>&nbsp;(Velvet Assembler Graphical Front End) is a GUI for the&nbsp;<a href="http://www.mybiosoftware.com/assembly-tools/3852">Velvet</a>&nbsp;de novo assembler.<a href="http://www.mybiosoftware.com/vague-1-0-5-velvet-assembler-graphical-user-environment.html" title="VAGUE 1.0.5 &ndash; Velvet Assembler Graphical User Environment"><br /><br /></a></li>
<li><a href="http://pritchardlab.stanford.edu/software.html" title="Transcriptome Assembler &ndash; Transcriptome Assembly used in RNA-seq of 16 Mammalian Species">Transcriptome Assembler &ndash; Transcriptome Assembly used in RNA-seq of 16 Mammalian Species<br /></a><a href="http://pritchardlab.stanford.edu/software.html" target="_blank">Transcriptome Assembler</a>&nbsp;is a software for transcriptome assembly used in RNA-seq of 16 mammalian species.<a href="http://www.mybiosoftware.com/transcriptome-assembler-transcriptome-assembly-rna-seq-16-mammalian-species.html" title="Transcriptome Assembler &ndash; Transcriptome Assembly used in RNA-seq of 16 Mammalian Species"><br /><br /></a></li>
<li><a href="http://bio.codeplex.com/wikipage?title=sequenceassembler&amp;referringTitle=sampleapps&amp;ANCHOR#sampleapps" title="BioSequenceAssembler 2.0 &ndash; Microsoft Research Sequence Assembler">BioSequenceAssembler 2.0 &ndash; Microsoft Research Sequence Assembler<br /></a><a href="http://bio.codeplex.com/wikipage?title=sequenceassembler&amp;referringTitle=sampleapps&amp;ANCHOR#sampleapps" target="_blank">BioSequenceAssembler</a>&nbsp;is intended for use by biologist and laboratory technicians who are responsible for managing next-generation genomic sequencing data for alignment, assembly, and/or BLAST identification.<a href="http://www.mybiosoftware.com/biosequenceassembler-2-0-microsoft-research-sequence-assembler.html" title="BioSequenceAssembler 2.0 &ndash; Microsoft Research Sequence Assembler"><br /><br /></a></li>
<li><a href="http://www.imperial.ac.uk/bioinformatics-data-science-group" title="BugBuilder &ndash; Microbial Genome Assembly">BugBuilder &ndash; Microbial Genome Assembly<br /></a><a href="http://www3.imperial.ac.uk/bioinfsupport/resources/software/bugbuilder" target="_blank">BugBuilder</a>&nbsp;is a pipeline for the automated assembly and annotation of microbial genomes from high-throughput sequence data. It is configurable so as not to be tied to any assembler or scaffolder, and is designed to run in a cluster environment facilitating high-throughput processing of genomes.<a href="http://www.mybiosoftware.com/bugbuilder-microbial-genome-assembly.html" title="BugBuilder &ndash; Microbial Genome Assembly"><br /></a></li>
<li><a href="http://maximuspipeline.sourceforge.net/main/">MAXIMUS 0.2 &ndash; Hybrid Reference and de novo Assembly pipeline</a><br /><a href="http://maximuspipeline.sourceforge.net/main/" target="_blank">MAXIMUS</a>&nbsp;is a genome assembly pipeline which takes the best out of multiple reference assemblies and de novo assembly. The benefits of this approach include better assembled repetitive regions, less gaps and higher accuracy for the resultant assembly.<a href="http://www.mybiosoftware.com/maximus-0-2-hybrid-reference-de-novo-assembly-pipeline.html" title="MAXIMUS 0.2 &ndash; Hybrid Reference and de novo Assembly pipeline"><br /><br /></a></li>
<li><a href="http://www.bcgsc.ca/about/pubann/the-issake-short-read-sequence-assembly-approach-for-profiling-t-cell-metagenomes" title="ISSAKE &ndash; Short Read Sequence Assembly">ISSAKE &ndash; Short Read Sequence Assembly<br /></a><a href="http://www.bcgsc.ca/about/pubann/the-issake-short-read-sequence-assembly-approach-for-profiling-t-cell-metagenomes" target="_blank">iSSAKE</a>&nbsp;(immuno-SSAKE) is a sequencing approach and assembly software for profiling T-cell metagenomes using short reads from the massively parallel sequencing platforms.<a href="http://www.mybiosoftware.com/issake-short-read-sequence-assembly.html" title="ISSAKE &ndash; Short Read Sequence Assembly"><br /><br /></a></li>
<li><a href="http://www.animalgenome.org/tools/beap/" title="IDBA / IDBA-UD 1.1.1 &ndash; De Bruijn Graph De Novo Assembler with Highly Uneven Sequencing Depth">IDBA / IDBA-UD 1.1.1 &ndash; De Bruijn Graph De Novo Assembler with Highly Uneven Sequencing Depth<br /></a><a href="http://i.cs.hku.hk/~alse/hkubrg/projects/idba/index.html" target="_blank">&nbsp;IDBA</a>&nbsp;is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.<a href="http://www.mybiosoftware.com/idba-ud-1-09-de-bruijn-graph-de-novo-assembler-highly-uneven-sequencing-depth.html" title="IDBA / IDBA-UD 1.1.1 &ndash; De Bruijn Graph De Novo Assembler with Highly Uneven Sequencing Depth"><br /><br /></a></li>
<li><a href="https://code.google.com/archive/p/est2assembly/" title="est2assembly 1.13 &ndash; Assembly and Annotation of Transcriptomes for any Species">est2assembly 1.13 &ndash; Assembly and Annotation of Transcriptomes for any Species<br />The&nbsp;</a><a href="https://code.google.com/p/est2assembly/" target="_blank">est2assembly</a>&nbsp;platform is the only platform for standardising transcriptome projects: go from raw trace files to an annotated GBrowse interface driven by the Seqfeature database. It accepts both Sanger and 454 sequencing technology for a denovo assembly, annotation and data mining of EST data.<a href="http://www.mybiosoftware.com/est2assembly-1-13-assembly-annotation-transcriptomes-species.html" title="est2assembly 1.13 &ndash; Assembly and Annotation of Transcriptomes for any Species"><br /><br /></a></li>
<li><a href="https://code.google.com/archive/p/curtain/" title="Curtain 0.2.3 beta &ndash; Assembling large Genomes from Short Read Sequences">Curtain 0.2.3 beta &ndash; Assembling large Genomes from Short Read Sequences<br /></a><a href="https://code.google.com/p/curtain/" target="_blank">Curtain</a>&nbsp;is an assembler of next generation sequence. Curtain is a Java wrapper around next-generation assemblers such as Velvet, which allows the incremental introduction of read-pair information into the assembly process.<a href="http://www.mybiosoftware.com/curtain-0-2-3-beta-assembling-large-genomes-short-read-sequences.html" title="Curtain 0.2.3 beta &ndash; Assembling large Genomes from Short Read Sequences"><br /><br /></a></li>
<li><a href="http://www.comp.nus.edu.sg/~bioinfo/peasm/PE_manual.htm" title="PEAssember 1.2 &ndash; A de novo Genome Assembler">PEAssember 1.2 &ndash; A de novo Genome Assembler<br /></a><a href="http://www.comp.nus.edu.sg/~bioinfo/peasm/PE_manual.htm" target="_blank">PEAssember</a>&nbsp;is a parallel de novo genome assembler for small &ndash; mid sized genomes.<a href="http://www.mybiosoftware.com/peassember-1-2-de-novo-genome-assembler.html" title="PEAssember 1.2 &ndash; A de novo Genome Assembler"><br /><br /></a></li>
<li><a href="https://sourceforge.net/projects/contrail-bio/" title="Contrail 0.8.2 &ndash; Assembly of Large Genomes using Cloud Computing">Contrail 0.8.2 &ndash; Assembly of Large Genomes using Cloud Computing<br /></a><a href="http://contrail-bio.sourceforge.net/" target="_blank">Contrail</a>&nbsp;is a Hadoop based genome assembler for assembling large genomes in the clouds<a href="http://www.mybiosoftware.com/contrail-0-8-2-assembly-large-genomes-cloud-computing.html" title="Contrail 0.8.2 &ndash; Assembly of Large Genomes using Cloud Computing"><br /><br /></a></li>
<li><a href="http://www.mybiosoftware.com/beap-0-6-beta-blast-extension-assembly-program.html" title="BEAP 0.6 beta &ndash; Blast Extension and Assembly Program">BEAP 0.6 beta &ndash; Blast Extension and Assembly Program<br />The&nbsp;</a><a href="http://www.animalgenome.org/tools/beap/" target="_blank">BEAP</a>&nbsp;is a computer program that uses a short starting DNA fragment, often a EST or partial gene segment, as &ldquo;primer&rdquo;, to recursively blast nucleotide databases in an attempt to obtain all sequences that overlaps, directly or indirectly, with the &ldquo;primer&rdquo; therefore help to &ldquo;extend&rdquo; the length of the original sequence for constructing a &ldquo;full length&rdquo; sequence for functional analysis, or at least to obtain neighboring regions of the segment for SNP discovery and linkage disequilibrium&nbsp;<a href="http://www.mybiosoftware.com/beap-0-6-beta-blast-extension-assembly-program.html" title="BEAP 0.6 beta &ndash; Blast Extension and Assembly Program"><br /><br /></a></li>
<li><a href="http://manuals.bioinformatics.ucr.edu/home/branch" title="BRANCH 1.8.1 &ndash; boosting RNA-Seq Assemblies with Partial or related Genomic Sequences">BRANCH 1.8.1 &ndash; boosting RNA-Seq Assemblies with Partial or related Genomic Sequences<br /></a><a href="http://manuals.bioinformatics.ucr.edu/home/branch" target="_blank">BRANCH</a>&nbsp;is a software that extends de novo transfrags and identifies novel transfrags with DNA contigs or genes of close related species. BRANCH discovers novel exons first and then extends/joins fragmented de novo transfrags, so that the resulted transfrags are more complete.<a href="http://www.mybiosoftware.com/branch-1-8-1-boosting-rna-seq-assemblies-partial-related-genomic-sequences.html" title="BRANCH 1.8.1 &ndash; boosting RNA-Seq Assemblies with Partial or related Genomic Sequences"><br /><br /></a></li>
<li><a href="http://www.cbcb.umd.edu/software/quake/">Quake 0.3.5 &ndash; Detect &amp; Correct Substitution Sequencing Errors in WGS Data Sets</a><br />
<p><a href="http://www.cbcb.umd.edu/software/quake/" target="_blank">Quake</a>&nbsp;is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. &gt;15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections,&nbsp;&nbsp;which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.</p>
</li>
<li><a href="http://www.ebi.ac.uk/~zerbino/velvet/" title="Velvet 1.2.10 &ndash; Sequence Assembler for Very Short Reads">Velvet 1.2.10 &ndash; Sequence Assembler for Very Short Reads<br /></a><a href="http://www.ebi.ac.uk/~zerbino/velvet/" target="_blank">Velvet</a>&nbsp;is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454.Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.<a href="http://www.mybiosoftware.com/velvet-1-1-07-sequence-assembler-short-reads.html" title="Velvet 1.2.10 &ndash; Sequence Assembler for Very Short Reads"><br /><br /></a></li>
<li><a href="http://www.complex.iastate.edu/download/Lucy2/index.html" title="Lucy 2.20 &ndash; DNA Sequence Quality &amp; Vector Trimming">Lucy 2.20 &ndash; DNA Sequence Quality &amp; Vector Trimming<br /></a><a href="http://www.complex.iastate.edu/download/Lucy2/index.html" target="_blank">Lucy</a>&nbsp;has been used for several years to clean sequence data from automated DNA sequencers prior to sequence assembly and other downstream uses. &nbsp;The quality trimming portion of lucy makes use of phred quality scores, such as those produced by many automated sequencers based on the Sanger sequencing method. &nbsp;As such, lucy&rsquo;s quality trimming may not be appropriate for sequence data produced by some of the new &ldquo;next-generation&rdquo; sequencers.<a href="http://www.mybiosoftware.com/lucy-2-19p-r8-dna-sequence-quality-vector-trimming.html" title="Lucy 2.20 &ndash; DNA Sequence Quality &amp; Vector Trimming"><br /><br /></a></li>
<li><a href="http://bioinfo.bti.cornell.edu/tool/iAssembler/">iAssembler 1.3.2 &ndash; de novo Assembly of Roche-454/Sanger Transcriptome Sequences</a><br /><a href="http://bioinfo.bti.cornell.edu/tool/iAssembler/" target="_blank">iAssembler</a>&nbsp;is a standalone package to assemble ESTs generated using Sanger and/or Roche-454 pyrosequencing technologies into contigs.<a href="http://www.mybiosoftware.com/iassembler-1-3-2-de-novo-assembly-roche-454sanger-transcriptome-sequences.html" title="iAssembler 1.3.2 &ndash; de novo Assembly of Roche-454/Sanger Transcriptome Sequences"><br /><br /></a></li>
<li><a href="http://www.broadinstitute.org/software/gaemr/" title="GAEMR 1.0.1 &ndash; Assembly Analysis Framework">GAEMR 1.0.1 &ndash; Assembly Analysis Framework<br /></a><a href="http://www.broadinstitute.org/software/gaemr/" target="_blank">GAEMR</a>&nbsp;(Genome Assembly Evaluation Metrics and Reportin) is a complete genome analysis package that helps you evaluate and report on a genome assembly&rsquo;s completeness, correctness, and contiguity.<a href="http://www.mybiosoftware.com/gaemr-1-0-1-assembly-analysis-framework.html" title="GAEMR 1.0.1 &ndash; Assembly Analysis Framework"><br /><br /></a></li>
<li><a href="https://mulcyber.toulouse.inra.fr/plugins/mediawiki/wiki/pyrocleaner/index.php/Main_Page" title="PyroCleaner 1.3 &ndash; Clean 454 Pyrosequencing Reads in order to ease the Assembly Process">PyroCleaner 1.3 &ndash; Clean 454 Pyrosequencing Reads in order to ease the Assembly Process<br />The&nbsp;</a><a href="https://mulcyber.toulouse.inra.fr/plugins/mediawiki/wiki/pyrocleaner/index.php/Main_Page" target="_blank">pyrocleaner</a>&nbsp;is intended to clean the reads included in the sff file in order to ease the assembly process. It enables filtering sequences on different criteria such as length, complexity, number of undetermined bases which has been proven to correlate with poor quality and multiple copy reads. It also enables to clean paired-ends sff files and generates on one side a sff with the validated paired-ends and on the other the sequences which can be used as shotgun reads.<a href="http://www.mybiosoftware.com/pyrocleaner-1-3-clean-454-pyrosequencing-reads-order-ease-assembly-process.html" title="PyroCleaner 1.3 &ndash; Clean 454 Pyrosequencing Reads in order to ease the Assembly Process"><br /><br /></a></li>
<li><a href="http://bioinformatics.rutgers.edu/Software/SLiQ/" title="SLiQ &ndash; Simple linear Inequalities based Mate-Pair reads Filtering and Scaffolding">SLiQ &ndash; Simple linear Inequalities based Mate-Pair reads Filtering and Scaffolding<br /></a><a href="http://bioinformatics.rutgers.edu/Software/SLiQ/" target="_blank">SLIQ&nbsp;</a>, a set of simple linear inequalities derived from the geometry of contigs on the line, can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph.<a href="http://www.mybiosoftware.com/sliq-simple-linear-inequalities-based-mate-pair-reads-filtering-scaffolding.html" title="SLiQ &ndash; Simple linear Inequalities based Mate-Pair reads Filtering and Scaffolding"><br /><br /></a></li>
<li><a href="http://bioinf.spbau.ru/en/rectangles" title="rectangles 2.0 &ndash; Rectangle Graph for Repeat Resolution in Genome Assembly">rectangles 2.0 &ndash; Rectangle Graph for Repeat Resolution in Genome Assembly<br /></a><a href="http://bioinf.spbau.ru/en/rectangles" target="_blank">rectangles</a>&nbsp;is an ultimate tool for resolving repeats in genome assemblies.<a href="http://www.mybiosoftware.com/rectangles-2-0-rectangle-graph-repeat-resolution-genome-assembly.html" title="rectangles 2.0 &ndash; Rectangle Graph for Repeat Resolution in Genome Assembly"><br /><br /></a></li>
<li><a href="http://archive.broadinstitute.org/crd/wiki/index.php/Arachne_Main_Page" title="Arachne 4.6233 &ndash; Whole-genome Shotgun Assembler">Arachne 4.6233 &ndash; Whole-genome Shotgun Assembler<br /></a><a href="http://www.broadinstitute.org/crd/wiki/index.php/Arachne_Main_Page" target="_blank">ARACHNE</a>&nbsp;is a program for assembling data from whole genome shotgun sequencing experiments. It was designed for long reads from Sanger sequencing technology, and has been used extensively to assemble many genomes, including many that are large and highly repetitive.<a href="http://www.mybiosoftware.com/arachne-3-2-whole-genome-shotgun-assembler.html" title="Arachne 4.6233 &ndash; Whole-genome Shotgun Assembler"><br /><br /></a></li>
<li><a href="http://terpconnect.umd.edu/~ALEKSEYZ/PhrapUMDV2/" title="Reconciliator 2.0 &ndash; The tool for Merging Assemblies">Reconciliator 2.0 &ndash; The tool for Merging Assemblies<br /></a><a href="http://terpconnect.umd.edu/~ALEKSEYZ/PhrapUMDV2/" target="_blank">Reconciliator</a>&nbsp;is the tool for merging assemblies.<a href="http://www.mybiosoftware.com/reconciliator-2-0-tool-merging-assemblies.html" title="Reconciliator 2.0 &ndash; The tool for Merging Assemblies"><br /><br /></a></li>
<li><a href="http://terpconnect.umd.edu/~ALEKSEYZ/PhrapUMDV2/" title="PhrapUMD 2 &ndash; Modified version of Phrap">PhrapUMD 2 &ndash; Modified version of Phrap<br /></a><a href="http://www.glue.umd.edu/~ALEKSEYZ/PhrapUMDV2" target="_blank">Phrap UMD</a>&nbsp;consists of the UMD Trimmer, UMD Overlapper and a modified version of Phrap.It is capable of assembling data downloaded directly from the NCBI Trace Archive. The pipeline runs in 3 stages: &nbsp;first the vector ends of the reads are examined and the vector is found. &nbsp;Then the reads are trimmed for vector and quality. &nbsp;After that the trimmed reads afe fed into the 5-pass UMD Overlapper that finds the overlaps, corrects the base caller errors and performs additional trimming if necessary. &nbsp;After the overlaps are produced, the trimmed and error-corrected reads and overlaps are input into the modified version of Phrap, whichonly puts the reads together if they overlap according to the list of overlaps produced by the UMD Overlapper.<a href="http://www.mybiosoftware.com/phrapumd-2-modified-version-phrap.html" title="PhrapUMD 2 &ndash; Modified version of Phrap"><br /><br /></a></li>
<li><a href="http://www.dna-dragon.com/" title="DNA Dragon 1.5.6 build1 &ndash; DNA Sequence Contig Assembler Software">DNA Dragon 1.5.6 build1 &ndash; DNA Sequence Contig Assembler Software<br /></a><a href="http://www.dna-dragon.com/" target="_blank">DNA Dragon</a>&nbsp;Contig Assembler assembles sequences, trace data (ABI, SCF, AB1), Illumina and Roche 454 flowgrams into contigs. It is a very fast and accurate DNA sequence assembly software. The DNA sequences are assembled into contigs and a direct comparision of trace date with nucleotide data is possible. It also allows for proofreading and base editing.<a href="http://www.mybiosoftware.com/dna-dragon-1-2-7-dna-sequence-contig-assembler-software.html" title="DNA Dragon 1.5.6 build1 &ndash; DNA Sequence Contig Assembler Software"><br /></a></li>
</ul>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/30625/pandaseq</guid>
	<pubDate>Mon, 23 Jan 2017 04:54:32 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/30625/pandaseq</link>
	<title><![CDATA[PANDASEQ]]></title>
	<description><![CDATA[<p>PANDASEQ assembles paired-end Illumina reads into sequences, trying to correct for errors and uncalled bases. The assembler reads two files in FASTQ format with quality information. If amplification primers were used (e.g., to isolate a variable region of the 16S gene, or the constant regions around zinc finger binding residues), they can be removed from the sequence during assembly. The final sequence will correct any uncalled bases in the overlapping region using the complementary strand. When mismatches occur in the overlapping region, the base with the better quality score is chosen.<br>The algorithm is as follows:<br><br>1.Find the positions where the forward and reverse primers match best above the threshold and discard the ends of the sequence, including the primer.<br>2.Pick and overlap to maximise the probability of the forward and reverse reads having come from a single piece of DNA.<br>3.Identify the masking of the end of the read with the quality score B or # as done by CASAVA and adjust the probabilities in this region.<br>4.Construct an assembled sequence between the primers and calculate the quality.<br>5.Check for various constraints, including quality, length, uncalled bases, and user-supplied modules.</p>
<p>http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html</p><p>Address of the bookmark: <a href="http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html" rel="nofollow">http://neufeldserver.uwaterloo.ca/~apmasell/pandaseq_man1.html</a></p>]]></description>
	<dc:creator>Shruti Paniwala</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31353/concoct-clustering-contigs-with-coverage-and-composition</guid>
	<pubDate>Mon, 06 Mar 2017 04:08:16 -0600</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31353/concoct-clustering-contigs-with-coverage-and-composition</link>
	<title><![CDATA[CONCOCT: Clustering cONtigs with COverage and ComposiTion]]></title>
	<description><![CDATA[<p>A program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.</p>
<p>Warning! This software is to be considered under development. Functionality and the user interface may still change significantly from one version to another. If you want to use this software, please stay up to date with the list of known issues:<a href="https://github.com/BinPro/CONCOCT/issues">https://github.com/BinPro/CONCOCT/issues</a></p><p>Address of the bookmark: <a href="https://github.com/BinPro/CONCOCT" rel="nofollow">https://github.com/BinPro/CONCOCT</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/31714/krona</guid>
	<pubDate>Wed, 22 Mar 2017 04:47:35 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/31714/krona</link>
	<title><![CDATA[Krona]]></title>
	<description><![CDATA[<p>Krona allows hierarchical data to be explored with zooming, multi-layered pie charts. Krona charts can be created using an <a href="https://github.com/marbl/Krona/wiki/ExcelTemplate">Excel template</a> or <a href="https://github.com/marbl/Krona/wiki/KronaTools">KronaTools</a>, which includes support for several bioinformatics tools and raw data formats. The interactive charts are self-contained and can be viewed with any modern web browser (see <a href="https://github.com/marbl/Krona/wiki/Browser%20support">Browser support</a>).</p>
<p><a href="http://marbl.github.io/Krona/img/screen_mgrast.png"><img src="https://camo.githubusercontent.com/27b71b1f1832523723c3d14dec764e7ad098438c/687474703a2f2f6d6172626c2e6769746875622e696f2f4b726f6e612f696d672f7468756d625f6d67726173742e706e67" width="210" height="167" alt="image" style="border: 0px;"></a></p><p>Address of the bookmark: <a href="https://github.com/marbl/Krona/wiki" rel="nofollow">https://github.com/marbl/Krona/wiki</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32481/sspace</guid>
	<pubDate>Fri, 05 May 2017 05:42:15 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32481/sspace</link>
	<title><![CDATA[SSPACE]]></title>
	<description><![CDATA[<p>SSPACE standard is a stand-alone program for scaffolding pre-assembled contigs using NGS paired-read data. It is unique in offering the possibility to manually control the scaffolding process. By using the distance information of paired-end and/or matepair data, SSPACE is able to assess the order, distance and orientation of your contigs and combine them into scaffolds. Currently we offer this as a command-line tool in Perl. The input data is given by pre-assembled contig sequences (FASTA) and NGS paired-read data (Illumina/454/Solid FASTA or FASTQ). The final scaffolds are provided in FASTA format.</p>
<p>&nbsp;</p><p>Address of the bookmark: <a href="https://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE" rel="nofollow">https://www.baseclear.com/genomics/bioinformatics/basetools/SSPACE</a></p>]]></description>
	<dc:creator>Jit</dc:creator>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/bookmarks/view/32946/grass-a-generic-algorithm-for-scaffolding-next-generation-sequencing-assemblies</guid>
	<pubDate>Tue, 23 May 2017 05:20:32 -0500</pubDate>
	<link>https://bioinformaticsonline.com/bookmarks/view/32946/grass-a-generic-algorithm-for-scaffolding-next-generation-sequencing-assemblies</link>
	<title><![CDATA[GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies.]]></title>
	<description><![CDATA[<p><span>GRASS (GeneRic ASsembly Scaffolder)-a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation-maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.</span></p><p>Address of the bookmark: <a href="https://github.com/AlexeyG/GRASS" rel="nofollow">https://github.com/AlexeyG/GRASS</a></p>]]></description>
	<dc:creator>Abhimanyu Singh</dc:creator>
</item>

<item>
  <guid isPermaLink='true'>https://bioinformaticsonline.com/opportunity/view/34368/srbioinformatics-analyst-ngs-at-ocimum</guid>
  <pubDate>Fri, 17 Nov 2017 07:50:44 -0600</pubDate>
  <link></link>
  <title><![CDATA[Sr.Bioinformatics Analyst (NGS) at Ocimum]]></title>
  <description><![CDATA[
<p>JOB FUNCTIONBio Tech/R&amp;D/Scientist<br />INDUSTRYBiotechnology/Pharmaceutical/Medicine<br />SPECIALIZATIONBasic Research,Bio-Statistician,Clinical Research<br />QUALIFICATION<br />Any Post Graduate<br />BA (Arts), B.Com. (Commerce), BE/ B.Tech (Engineering), B.Pharm. (Pharmacy), B.Sc. (Science), BL/LLB, BDS (Dental Surgery), B.Ed. (Education), BHM (Hotel Management), BBA/ BBM/ BBS, B.Arch. (Architecture), BCA (Computer Application), Diploma-Other Diploma, B.Plan. (Planning), BGL, B.V.Sc. (Veterinary Science), Other School/ Graduation, BHMS (Homeopathy), BAMS (Ayurveda)<br />Job Description</p>

<p>1.  Must have basic understanding of molecular biology and Genomics.<br />2. Experience in application development or must have expertise in programming using either of Perl/Python.<br />3.  Experience in statistical programming using R/Bioconductor/Matlab.<br />4. Strong concept in statistical and mathematical modelling.<br />5.  Experience in designing and developing the bioinformatics pipeline.<br />6.  Must have minimum 2+ years of hands on experience in NSG data analysis such as RNA-Seq,Exome-Seq ,Chip-Seq and downstream analysis.<br />7. Knowledge in WGS ,WES, Targeted re-sequencing,GWAS and population genomics will be preferred.<br />8. Must have experience working on opensource software/Framework and commercial software for NGS data analysis and reporting.<br />9. Should be aware of handling big data and guiding team members on multiple projects simultaneously.<br />10. Should have experience coordinating with different groups of clinical research scientist for various project requirements.<br />11. Ability to work as team as well as independently with minimal support.</p>

<p>More at http://www3.ocimumbio.com/</p>
]]></description>
</item>
<item>
	<guid isPermaLink="true">https://bioinformaticsonline.com/blog/view/35621/bbtools-for-bioinformatician</guid>
	<pubDate>Thu, 15 Feb 2018 16:45:52 -0600</pubDate>
	<link>https://bioinformaticsonline.com/blog/view/35621/bbtools-for-bioinformatician</link>
	<title><![CDATA[BBTools for bioinformatician !]]></title>
	<description><![CDATA[<p><span></span><br /><strong>BBMap.sh</strong><br /><br /></p><ul>
<li><strong>Mapping Nanopore reads</strong></li>
</ul><p><br /><span>BBMap.sh has a length cap of 6kbp. Reads longer than this will be broken into 6kbp pieces and mapped independently.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ mapPacBio.sh -Xmx20g k=7 in=reads.fastq ref=reference.fa maxlen=1000 minlen=200 idtag ow int=f qin=33 out=mapped1.sam minratio=0.15 ignorequality slow ordered maxindel1=40 maxindel2=400</pre></div><p><br /><span>The "maxlen" flag shreds them to a max length of 1000; you can set that up to 6000. But I found 1000 gave a higher mapping rate.&nbsp;&nbsp;</span><br /><br /></p><ul>
<li><strong>Using Paired-end and single-end reads at the same time</strong></li>
</ul><p><br /><span>BBMap itself can only run single-ended or paired-ended in a single run, but it has a wrapper that can accomplish it, like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append</pre></div><p><span>This will write all the reads to the same output file but only print the headers once. I have not tried that for bam output, only sam output</span><br /><br /><span>Note about alignment stats: For paired reads, you can find the total percent mapped by adding the read 1 percent (where it says "mapped: N%") and read 2 percent, then dividing by 2. The different columns tell you the count/percent of each event. Considering the cigar strings from alignment, "Match Rate" is the number of symbols indicating a reference match (=) and error rate is the number indicating substitution, insertion, or deletion (X, I, D).</span><br /><br /></p><ul>
<li><strong>Exact matches when mapping small reads (e.g. miRNA)</strong></li>
</ul><p><br /><span>When mapping small RNA's with BBMap use the following flags to report only perfect matches.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">ambig=all vslow perfectmode maxsites=1000</pre></div><p><span>It should be very fast in that mode (despite the vslow flag). Vslow mainly removes masking of low-complexity repetitive kmers, which is not usually a problem but can be with extremely short sequences like microRNAs.</span></p><ul>
<li><strong>Important note about BBMap alignments</strong></li>
</ul><p><br /><span>BBMap is always nondeterministic when run in paired-end mode with multiple threads, because the insert-size average is calculated on a per-thread basis, which affects mapping; and which reads are assigned to which thread is nondeterministic. The only way to avoid that would be to restrict it to a single thread (threads=1), or map the reads as single-ended and then fix pairing afterward:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">bbmap.sh in=reads.fq outu=unmapped.fq int=f
repair.sh in=unmapped.fq out=paired.fq fint outs=singletons.fq</pre></div><p><span>In this case you'd want to only keep the paired output.&nbsp;</span><br /><br /><span>BBSplit is based on BBMap, so it is also nondeterministic in paired mode with multiple threads. BBDuk and Seal (which can be used similarly to BBSplit) are always deterministic.&nbsp;</span><br /><br /><span>--------------------------------------------------------</span><br /><br /><strong>Reformat.sh</strong></p><ul>
<li><strong>Count k-mers/find unknown primers</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.fq out=trimmed.fq ftr=19</pre></div><p><span>This will trim all but the first 20 bases (all bases after position 19, zero-based).</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ kmercountexact.sh in=trimmed.fq out=counts.txt fastadump=f mincount=10 k=20 rcomp=f</pre></div><p><span>This will generate a file containing the counts of all 20-mers that occurred at least 10 times, in a 2-column format that is easy to sort in Excel.&nbsp;</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">ACCGTTACCGTTACCGTTAC	100
AAATTTTTTTCCCCCCCCCC	85</pre></div><p><span>...etc. If the primers are 20bp long, they should be pretty obvious.&nbsp;&nbsp;</span></p><ul>
<li><strong>Convert SAM format from 1.4 to 1.3 (required for many programs)</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.sam out=out.sam sam=1.3</pre></div><ul>
<li><strong>Removing N basecalls</strong></li>
</ul><p><br /><span>You can use BBDuk or Reformat with "qtrim=rl trimq=1". That will only trim trailing and leading bases with Q-score below 1, which means Q0, which means N (in either fasta or fastq format). The BBMap package automatically changes q-scores of Ns that are above 0 to 0 and called bases with q-scores below 2 to 2, since occasionally some Illumina software versions produces odd things like a handful of Q0 called bases or Ns with Q&gt;0, neither of which make any sense in the Phred scale.</span></p><ul>
<li><strong>Sampling reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.fq out=sampled.fq sample=3000</pre></div><div><div>Code:</div><pre dir="ltr">To sample 10% of the reads:
reformat.sh in1=reads1.fq in2=reads2.fq out1=sampled1.fq out2=sampled2.fq samplerate=0.1

or more concisely:
reformat.sh in=reads#.fq out=sampled#.fq samplerate=0.1

and for exact sampling:
reformat.sh in=reads#.fq out=sampled#.fq samplereadstarget=100k</pre></div><ul>
<li><strong>Changing fasta headers</strong></li>
</ul><p><br /><span>Remove anything after the first space in fasta header.&nbsp;</span><br /><br /></p><div><div>Code:</div><pre dir="ltr"> reformat.sh in=sequences.fasta out=renamed.fasta trd</pre></div><p><span>"trd" stands for "trim read description" and will truncate everything after the first whitespace.</span></p><ul>
<li><strong>Extract reads from a sam file</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.sam out=reads.fastq</pre></div><ul>
<li><strong>Verify pairing and optionally de-interleave the reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.fastq verifypairing</pre></div><ul>
<li><strong>Verify pairing if the reads are in separate files</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in1=r1.fq in2=r2.fq vpair</pre></div><p><span>If that completes successfully and says the reads were correctly paired, then you can simply de-interleave reads into two files like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.fastq out1=r1.fastq out2=r2.fastq</pre></div><ul>
<li><strong>Base quality histograms</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=reads.fq qchist=qchist.txt</pre></div><p><span>That stands for "quality count histogram".&nbsp;</span></p><ul>
<li><strong>Filter SAM/BAM file by read length</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=x.sam out=y.sam minlength=50 maxlength=200</pre></div><ul>
<li><strong>Filter SAM/BAM file to detect/filter spliced reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=mapped.bam out=filtered.bam maxdellen=50</pre></div><p><span>You can set "maxdellen" to whatever length deletion event you consider the minimum to signify splicing, which depends on the organism.</span><br /><span>-------------------------------------------------------------</span><br /><strong>Repair.sh</strong></p><ul>
<li><strong>"Re-pair" out-of-order reads from paired-end data files</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ repair.sh in1=r1.fq.gz in2=r2.fq.gz out1=fixed1.fq.gz out2=fixed2.fq.gz outsingle=singletons.fq.gz</pre></div><p><span>--------------------------------------------------------------</span><br /><strong>BBMerge.sh</strong><br /><br /><span>BBMerge now has a new flag - "outa" or "outadapter". This allows you to automatically detect the adapter sequence of reads with short insert sizes, in case you don't know what adapters were used. It works like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbmerge.sh in=reads.fq outa=adapters.fa reads=1m</pre></div><p><span>Of course, it will only work for paired reads! The output fasta file will look like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">&gt;Read1_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG
&gt;Read2_adapter
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTG</pre></div><p><span>If you have multiplexed things with different barcodes in the adapters, the part with the barcode will show up as Ns, like this:</span><br /><br /><span>GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG&nbsp;&nbsp;</span><br /><br /><span>Note: For BBMerge with micro-RNA, you need to add the flag&nbsp;</span><strong>mininsert=17</strong><span>. The default is 35, which is too long for micro-RNA libraries.&nbsp;</span></p><ul>
<li><strong>Identifying adapters</strong></li>
</ul><p><span>If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa</pre></div><p><br /><span>-----------------------------------------------------------------</span><br /><br /><strong>BBDuk.sh</strong><br /><br /><span>Note: BBDuk is strictly deterministic on a per-read basis, however it does by default reorder the reads when run multithreaded. You can add the flag "ordered" to keep output reads in the same order as input reads</span></p><ul>
<li><strong>Finding reads with a specific sequence at the beginning of read</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ bbduk.sh -Xmx1g in=reads.fq outm=matched.fq outu=unmatched.fq restrictleft=25 k=25 literal=AAAAACCCCCTTTTTGGGGGAAAAA</pre></div><p><span>In this case, all reads starting with "AAAAACCCCCTTTTTGGGGGAAAAA" will end up in "matched.fq" and all other reads will end up in "unmatched.fq". Specifically, the command means "look for 25-mers in the leftmost 25 bp of the read", which will require an exact prefix match, though you can relax that if you want.</span><br /><br /><span>So you could bin all the reads with your known sequence, then look at the remaining reads to see what they have in common. You can do the same thing with the tail of the read using "restrictright" instead, though you can't use both restrictions at the same time.&nbsp;&nbsp;</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbduk.sh in=reads.fq outm=matched.fq literal=NNNNNNCCCCGGGGGTTTTTAAAAA k=25 copyundefined</pre></div><p><span>With the "copyundefined" flag, a copy of each reference sequence will be made representing every valid combination of defined letter. So instead of increasing memory or time use by 6^75, it only increases them by 4^6 or 4096 which is completely reasonable, but it only allows substitutions at predefined locations. You can use the "copyundefined", "hdist", and "qhdist" flags together for a lot of flexibility - for example, hdist=2 qhdist=1 and 3 Ns in the reference would allow a hamming distance of 6 with much lower resource requirements than hdist=6. Just be sure to give BBDuk as much memory as possible.</span></p><ul>
<li><strong>Removing illumina adapters (if exact adapters not known)</strong></li>
</ul><p><br /><span>If you're not sure which adapters are used, you can add "ref=truseq.fa.gz,truseq_rna.fa.gz,nextera.fa.gz" and get them all (this will increase the amount of overtrimming, though it should still be negligible).&nbsp;</span></p><ul>
<li><strong>Removing illumina control sequences/phiX reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">bbduk.sh in=trimmed.fq.gz out=filtered.fq.gz k=31 ref=artifacts,phix ordered cardinality</pre></div><ul>
<li><strong>Identify certain reads that contain a specific sequence</strong></li>
</ul><div><div>Code:</div><pre dir="ltr">$ bbduk.sh in=reads.fq out=unmatched.fq outm=matched.fq literal=ACGTACGTACGTACGTAC k=18 mm=f hdist=2</pre></div><p><span>Make sure "k" is set to the exact length of the sequence. "hdist" controls the number of substitutions allowed. "outm" gets the reads that match. By default this also looks for the reverse-complement; you can disable that with "rcomp=f".&nbsp;&nbsp;</span></p><ul>
<li><strong>Extract sequences that share kmers with your sequences with BBDuk</strong></li>
</ul><div><div>Code:</div><pre dir="ltr">$ bbduk.sh in=a.fa ref=b.fa out=c.fa mkf=1 mm=f k=31</pre></div><p><span>This will print to C all the sequences in A that share 100% of their 31-mers with sequences in B.&nbsp;</span><br /><br /></p><ul>
<li><strong>Extract sequences that contain N's with BBDuk</strong></li>
</ul><div><div>Code:</div><pre dir="ltr">bbduk.sh in=reads.fq out=readsWithoutNs.fq outm=readsWithNs.fq maxns=0</pre></div><p><span>If you have, say, 100bp reads and only want to separate reads containing all 100 Ns, change that to "maxns=99".</span><br /><br /><strong>General notes for BBDuk.sh</strong><span>&nbsp;</span><br /><br /><span>BBDuk can operate in one of 4 kmer-matching modes:</span><br /><span>Right-trimming (ktrim=r), left-trimming (ktrim=l), masking (ktrim=n), and filtering (default). But it can only do one at a time because all kmers are stored in a single table. It can still do non-kmer-based operations such as quality trimming at the same time.</span><br /><br /><span>BBDuk2 can do all 4 kmer operations at once and is designed for integration into automated pipelines where you do contaminant removal and adapter-trimming in a single pass to minimize filesystem I/O. Personally, I never use BBDuk2 from the command line. Both have identical capabilities and functionality otherwise, but the syntax is different.</span><br /><br /><span>------------------------------------------------------------------</span><br /><br /><strong>Randomreads.sh</strong></p><ul>
<li><strong>Generate random reads in various formats</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ randomreads.sh ref=genome.fasta out=reads.fq len=100 reads=10000</pre></div><p><span>You can specify paired reads, an insert size distribution, read lengths (or length ranges), and so forth. But because I developed it to benchmark mapping algorithms, it is specifically designed to give excellent control over mutations. You can specify the number of snps, insertions, deletions, and Ns per read, either exactly or probabilistically; the lengths of these events is individually customizable, the quality values can alternately be set to allow errors to be generated on the basis of quality; there's a PacBio error model; and all of the reads are annotated with their genomic origin, so you will know the correct answer when mapping.</span><br /><br /><span>Bear in mind that 50% of the reads are going to be generated from the plus strand and 50% from the minus strand. So, either a read will match the reference perfectly, OR its reverse-complement will match perfectly.</span><br /><br /><span>You can generate the same set of reads with and without SNPs by fixing the seed to a positive number, like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ randomreads.sh maxsnps=0 adderrors=false out=perfect.fastq reads=1000 minlength=18 maxlength=55 seed=5

$ randomreads.sh maxsnps=2 snprate=1 adderrors=false out=2snps.fastq reads=1000 minlength=18 maxlength=55 seed=5</pre></div><p><span>[As of BBmap v. 36.59] rendomreads.sh gains the ability to simulate metagenomes.&nbsp;</span><br /><br /><span>coverage=X will automatically set "reads" to a level that will give X average coverage (decimal point is allowed).</span><br /><br /><span>metagenome will assign each scaffold a random exponential variable, which decides the probability that a read be generated from that scaffold. So, if you concatenate together 20 bacterial genomes, you can run randomreads and get a metagenomic-like distribution. It could also be used for RNA-seq when using a transcriptome reference.</span><br /><br /><span>The coverage is decided on a per-reference-sequence level, so if a bacterial assembly has more than one contig, you may want to glue them together first with fuse.sh before concatenating them with the other references.&nbsp;</span><br /><br /></p><ul>
<li><strong>Simulate a jump library</strong></li>
</ul><p><br /><span>You can simulate a 4000bp jump library from your existing data like this.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ cat assembly1.fa assembly2.fa &gt; combined.fa
$ bbmap.sh ref=combined.fa
$ randomreads.sh reads=1000000 length=100 paired interleaved mininsert=3500 maxinsert=4500 bell perfect=1 q=35 out=jump.fq.gz</pre></div><p><span>--------------------------------------------------------------</span><br /><strong>Shred.sh</strong><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ shred.sh in=ref.fasta out=reads.fastq length=200</pre></div><p><span>The difference is that RandomReads will make reads in a random order from random locations, ensuring flat coverage on average, but it won't ensure 100% coverage unless you generate many fold depth. Shred, on the other hand, gives you exactly 1x depth and exactly 100% coverage (and is not capable of modelling errors). So, the use-cases are different.&nbsp;</span><br /><span>---------------------------------------------------------------</span><br /><strong>Demuxbyname.sh</strong></p><ul>
<li><strong>Demultiplex fastq files when the tag is present in the fastq read header (illumina)</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ demuxbyname.sh in=r#.fq out=out_%_#.fq prefixmode=f names=GGACTCCT+GCGATCTA,TAAGGCGA+TCTACTCT,...
outu=filename</pre></div><p><span>"Names" can also be a text file with one barcode per line (in exactly the format found in the read header). You do have to include all of the expected barcodes, though.</span><br /><br /><span>In the output filename, the "%" symbol gets replaced by the barcode; in both the input and output names, the "#" symbol gets replaced by 1 or 2 for read 1 or read 2. It's optional, though; you can leave it out for interleaved input/output, or specify in1=/in2=/out1=/out2= if you want custom naming.</span><br /><br /><span>----------------------------------------------------------------</span><br /><br /><strong>Readlength.sh</strong></p><ul>
<li><strong>Plotting the length distribution of reads</strong></li>
</ul><div><div>Code:</div><pre dir="ltr">$ readlength.sh in=file out=histogram.txt bin=10 max=80000</pre></div><p><span>That will plot the result in bins of size 10, with everything above 80k placed in the same bin. The defaults are set for relatively short sequences so if they are many megabases long you may need to add the flag "-Xmx8g" and increase "max=" to something much higher.</span><br /><br /><span>Alternatively, if these are assemblies and you're interested in continuity information (L50, N50, etc), you can run stats on each or statswrapper on all of them:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">stats.sh in=file</pre></div><p><span>or</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">statswrapper.sh in=file,file,file,file&hellip;</pre></div><p><span>----------------------------------------------------------------</span><br /><strong>Filterbyname.sh</strong><br /><br /><span>By default, "filterbyname" discards reads with names in your name list, and keeps the rest. To include them and discard the others, do this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ filterbyname.sh in=003.fastq out=filter003.fq names=names003.txt include=t</pre></div><p><span>----------------------------------------------------------------</span><br /><strong>getreads.sh</strong><br /><br /><span>If you only know the number(s) of the fasta/fastq record(s) in a file (records start at 0) then you can use the following command to extract those reads in a new file.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ getreads.sh in= id=&lt;number,number,number...&gt; out=</pre></div><p><span>The first read (or pair) has ID 0, the second read (or pair) has ID 1, etc.</span><br /><br /><span>Parameters:</span><br /><span>in= Specify the input file, or stdin.</span><br /><span>out= Specify the output file, or stdout.</span><br /><span>id= Comma delimited list of numbers or ranges, in any order.</span><br /><span>For example: id=5,93,17-31,8,0,12-13&nbsp;</span><br /><span>----------------------------------------------------------------</span><br /><strong>Splitsam.sh</strong></p><ul>
<li><strong>Splits a sam file into forward and reverse reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">splitsam.sh mapped.sam plus.sam minus.sam unmapped.sam
reformat.sh in=plus.sam out=plus.fq
reformat.sh in=minus.sam out=minus.fq rcomp</pre></div><p><span>----------------------------------------------------------------</span><br /><strong>BBSplit.sh</strong><br /><br /><span>BBSplit now has the ability to output paired reads in dual files using the # symbol. For example:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbsplit.sh ref=x.fa,y.fa in1=read1.fq in2=read2.fq basename=o%_#.fq</pre></div><p><span>will produce ox_1.fq, ox_2.fq, oy_1.fq, and oy_2.fq</span><br /><br /><span>You can use the # symbol for input also, like "in=read#.fq", and it will get expanded into 1 and 2.&nbsp;&nbsp;</span><br /><br /><strong>Added feature:&nbsp;</strong><span>One can specify a directory for the "ref=" argument. If anything in the list is a directory, it will use all fasta files in that directory. They need a fasta extension, like .fa or .fasta, but can be compressed with an additional .gz after that. Reason this is useful is to use BBSplit is to have it split input into one output file per reference file.</span><br /><br /><br /><strong>NOTE: 1</strong><span>&nbsp;By default BBSplit uses fairly strict mapping parameters; you can get the same sensitivity as BBMap by adding the flags "minid=0.76 maxindel=16k minhits=1". With those parameters it is extremely sensitive.</span><br /><br /><strong>NOTE: 2</strong><span>&nbsp;BBSplit has different ambiguity settings for dealing with reads that map to multiple genomes. In any case, if the alignment score is higher to one genome than another, it will be associated with that genome only (this considers the combined scores of read pairs - pairs are always kept together). But when a read or pair has two identically-scoring mapping locations, on different genomes, the behavior is controlled by the "ambig2" flag - "ambig2=toss" will discard the read, "all" will send it to all output files, and "split" will send it to a separate file for ambiguously-mapped reads (one per genome to which it maps).</span><br /><br /><strong>NOTE: 3</strong><span>&nbsp;Zero-count lines are suppressed by default, but they should be printed if you include the flag "nzo=f" (nonzeroonly=false).&nbsp;</span><br /><br /><strong>NOTE: 4</strong><span>&nbsp;BBSplit needs multiple reference files as input; one per organism, or one for target and another for everything else. It only outputs one file per reference file.</span><br /><br /><span>Seal.sh, on the other hand, which is similar, can use a single concatenated file, as it (by default) will output one file per reference sequence within a concatenated set of references.&nbsp;</span><br /><span>--------------------------------------------------------------</span><br /><strong>Pileup.sh</strong></p><ul>
<li><strong>To generate transcript coverage stats</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ pileup.sh in=mapped.sam normcov=normcoverage.txt normb=20 stats=stats.txt</pre></div><p><span>That will generate coverage per transcript, with 20 lines per transcript, each line showing the coverage for that fraction of the transcript. "stats" will contain other information like the fraction of bases in each transcript that was covered.&nbsp;</span></p><ul>
<li><strong>To calculate physical coverage stats (region covered by paired-end reads)&nbsp;</strong></li>
</ul><p><span>BBMap has a "physcov" flag that allows it to report physical rather than sequenced coverage. It can be used directly in BBMap, or with pileup, if you already have a sam file. For example:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ pileup.sh in=mapped.sam covstats=coverage.txt</pre></div><ul>
<li><strong>Calculating coverage of the genome</strong></li>
</ul><p><br /><span>Program will take sam or bam, sorted or unsorted.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ pileup.sh in=mapped.sam out=stats.txt hist=histogram.txt</pre></div><p><span>stats.txt will contain the average depth and percent covered of each reference sequence; the histogram will contain the exact number of bases with a each coverage level. You can also get per-base coverage or binned coverage if you want to plot the coverage. It also generates median and standard deviation, and so forth.</span><br /><br /><span>It's also possible to generate coverage directly from BBMap, without an intermediate sam file, like this:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ bbmap.sh in=reads.fq ref=reference.fasta nodisk covstats=stats.txt covhist=histogram.txt</pre></div><p><span>We use this a lot in situations where all you care about is coverage distributions, which is somewhat common in metagenome assemblies. It also supports most of the flags that pileup.sh supports, though the syntax is slightly different to prevent collisions. In each case you can see all the possible flags by running the shellscript with no arguments.</span></p><ul>
<li><strong>To bin aligned reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ pileup.sh in=mapped.sam out=stats.txt bincov=coverage.txt binsize=1000</pre></div><p><span>That will give coverage within each bin. For read density regardless of read length, add the "startcov=t" flag.&nbsp;&nbsp;</span><br /><br /><span>--------------------------------------------------------------</span><br /><strong>Dedupe.sh</strong><br /><br /><span>Dedupe ensures that there is at most one copy of any input sequence, optionally allowing contaminants (substrings) to be removed, and a variable hamming or edit distance to be specified. Usage:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ dedupe.sh in=assembly1.fa,assembly2.fa out=merged.fa</pre></div><p><span>That will absorb exact duplicates and containments. You can use "hdist" and "edist" flags to allow mismatches, or get a complete list of flags by running the shellscript with no arguments.&nbsp;&nbsp;</span><br /><br /><span>Dedupe&nbsp;</span><span style="text-decoration: underline;">will merge assemblies</span><span>, but it&nbsp;</span><span style="text-decoration: underline;">will not produce consensus sequences or join overlapping reads</span><span>; it only removes sequences that are fully contained within other sequences (allowing the specified number of mismatches or edits).</span><br /><br /><span>Dedupe can remove duplicate reads from multiple files simultaneously, if they are comma-delimited (e.g. in=file1.fastq,file2.fastq,file3.fastq). And if you set the flag "uniqueonly=t" then ALL copies of duplicate reads will be removed, as opposed to the default behavior of leaving one copy of duplicate reads.</span><br /><br /><span>However, it does not care which file a read came from; in other words, it can't remove only reads that are duplicates across multiple files but leave the ones that are duplicates within a file. That can still be accomplished, though, like this:</span><br /><br /><span>1) Run dedupe on each sample individually, so now there are at most 1 copy of a read per sample.</span><br /><span>2) Run dedupe again on all of the samples together, with "uniqueonly=t". The only remaining duplicate reads will be the ones duplicated between samples, so that's all that will be removed.&nbsp;&nbsp;</span><br /><br /><span>--------------------------------------------------------------</span></p><ul>
<li><strong>Generate ROC curves from any aligner</strong></li>
</ul><p><br /><strong>[*]index the reference<br /><br /></strong></p><div><div>Code:</div><pre dir="ltr">$ bbmap.sh ref=reference.fasta</pre></div><p><br /><strong>[*]Generate random reads</strong><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ randomreads.sh reads=100000 length=100 out=synth.fastq maxq=35 midq=25 minq=15</pre></div><p><strong>[*]Map to produce a sam file</strong><br /><br /><span>...substitute this command with the appropriate one from your aligner of choice</span></p><div><div>Code:</div><pre dir="ltr">$ bbmap.sh in=synth.fq out=mapped.sam</pre></div><p><strong>[*]Generate ROC curve</strong><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ samtoroc.sh in=mapped.sam reads=100000</pre></div><p><span>--------------------------------------------------------------</span></p><ul>
<li><strong>Calculate heterozygous rate for sequence data</strong></li>
</ul><p><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ kmercountexact.sh in=reads.fq khist=histogram.txt peaks=peaks.txt</pre></div><p><span>You can examine the histogram manually, or use the "peaks" file which tells you the number of unique kmers in each peak on the histogram. For a diploid, the first peak will be the het peak, the second will be the homozygous peak, and the rest will be repeat peaks. The peak caller is not perfect, though, so particularly with noisy data I would only rely on it for the first two peaks, and try to quantify the higher-order peaks manually if you need to (which you generally don't).</span><br /><br /><span>-----------------------------------------------------------------</span></p><ul>
<li><strong>Compare mapped reads between two files</strong></li>
</ul><p><br /><span>To see how many mapped reads (can be mapped concordant or discordant, doesn't matter) are shared between the two alignment files and how many mapped reads are unique to one file or the other.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ reformat.sh in=file1.sam out=mapped1.sam mappedonly
$ reformat.sh in=file2.sam out=mapped2.sam mappedonly</pre></div><p><span>That gets you the mapped reads only. Then:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ filterbyname.sh in=mapped1.sam names=mapped2.sam out=shared.sam include=t</pre></div><p><span>...which gets you the set intersection;</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ filterbyname.sh in=mapped1.sam names=mapped2.sam out=only1.sam include=f
$ filterbyname.sh in=mapped2.sam names=mapped1.sam out=only2.sam include=f</pre></div><p><span>...which get you the set subtractions.&nbsp;&nbsp;</span><br /><br /><span>--------------------------------------------------------------</span><br /><br /><strong>BBrename.sh</strong></p><div><div>Code:</div><pre dir="ltr">$ bbrename.sh in=old.fasta out=new.fasta</pre></div><p><span>That will rename the reads as 1, 2, 3, 4, ... 222.</span><br /><br /><span>You can also give a custom prefix if you want. The input has to be text format, not .doc.&nbsp;&nbsp;</span><br /><br /><span>---------------------------------------------------------------------</span><br /><br /><strong>BBfakereads.sh</strong></p><ul>
<li><strong>Generating &ldquo;fake&rdquo; paired end reads from a single end read file</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ bfakereads.sh in=reads.fastq out1=r1.fastq out2=r2.fastq length=100</pre></div><p><span>That will generate fake pairs from the input file, with whatever length you want (maximum of input read length). We use it in some cases for generating a fake LMP library for scaffolding from a set of contigs. Read 1 will be from the left end, and read 2 will be reverse-complemented and from the right end; both will retain the correct original qualities. And " /1" " /2" will be suffixed after the read name.&nbsp;&nbsp;</span><br /><br /><span>------------------------------------------------------------------</span><br /><strong>Randomreads.sh</strong></p><ul>
<li><strong>Generate random reads</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ randomreads.sh ref=genome.fasta out=reads.fq len=100 reads=10000</pre></div><p><span>"seed=-1" will use a random seed; any other value will use that specific number as the seed</span><br /><br /><span>You can specify paired reads, an insert size distribution, read lengths (or length ranges), and so forth. But because I developed it to benchmark mapping algorithms, it is specifically designed to give excellent control over mutations. You can specify the number of snps, insertions, deletions, and Ns per read, either exactly or probabilistically; the lengths of these events is individually customizable, the quality values can alternately be set to allow errors to be generated on the basis of quality; there's a PacBio error model; and all of the reads are annotated with their genomic origin, so you will know the correct answer when mapping.</span><br /><br /><span>--------------------------------------------------------------------</span></p><ul>
<li><strong>Generate saturation curves to assess sequencing depth</strong></li>
</ul><p>&nbsp;</p><div><div>Code:</div><pre dir="ltr">$ bbcountunique.sh in=reads.fq out=histogram.txt</pre></div><p><span>It works by pulling kmers from each input read, and testing whether it has been seen before, then storing it in a table.</span><br /><br /><span>The bottom line, "first", tracks whether the first kmer of the read has been seen before (independent of whether it is read 1 or read 2).</span><br /><br /><span>The top line, "pair", indicates whether a combined kmer from both read 1 and read 2 has been seen before. The other lines are generally safe to ignore but they track other things, like read1- or read2-specific data, and random kmers versus the first kmer.</span><br /><br /><span>It plots a point every X reads (configurable, default 25000).</span><br /><br /><span>In noncumulative mode (default), a point indicates "for the last X reads, this percentage had never been seen before". In this mode, once the line hits zero, sequencing more is not useful.</span><br /><br /><span>In cumulative mode, a point indicates "for all reads, this percentage had never been seen before", but still only one point is plotted per X reads.</span><br /><br /><span>-----------------------------------------------------------------</span><br /><strong>CalcTrueQuality.sh</strong><br /><br /><a href="http://seqanswers.com/forums/showthread.php?p=170904" target="_blank">http://seqanswers.com/forums/showthread.php?p=170904</a><br /><br /><span>In light of the quality-score issues with the NextSeq platform, and the possibility of future Illumina platforms (HiSeq 3000 and 4000) also using quantized quality scores, I developed it for recalibrating the scores to ensure accuracy and restore the full range of values.</span><br /><br /><span>-----------------------------------------------------------------</span><br /><br /><strong>BBMapskimmer.sh</strong><br /><br /><span>BBMap is designed to find the best mapping, and heuristics will cause it to ignore mappings that are valid but substantially worse. Therefore, I made a different version of it, BBMapSkimmer, which is designed to find all of the mappings above a certain threshold. The shellscript is bbmapskimmer.sh and the usage is similar to bbmap.sh or mapPacBio.sh. For primers, which I assume will be short, you may wish to use a lower than default K of, say, 10 or 11, and add the "slow" flag.</span><br /><br /><span>--------------------------------------------------------------</span><br /><br /><strong>msa.sh and curprimers.sh</strong><br /><br /><span>Quoted from Brian's response directly.</span><br /><br /><span>I also wrote another pair of programs specifically for working with primer pairs, msa.sh and cutprimers.sh. msa.sh will forcibly align a primer sequence (or a set of primer sequences) against a set of reference sequences to find the single best matching location per reference sequence - in other words, if you have 3 primers and 100 ref sequences, it will output a sam file with exactly 100 alignments - one per ref sequence, using the primer sequence that matched best. Of course you can also just run it with 1 primer sequence.</span><br /><br /><span>So you run msa twice - once for the left primer, and once for the right primer - and generate 2 sam files. Then you feed those into cutprimers.sh, which will create a new fasta file containing the sequence between the primers, for each reference sequence. We used these programs to synthetically cut V4 out of full-length 16S sequences.</span><br /><br /><span>I should say, though, that the primer sites identified are based on the normal BBMap scoring, which is not necessarily the same as where the primers would bind naturally, though with highly conserved regions there should be no difference.</span><br /><br /><span>------------------------------------------------------</span><br /><strong>testformat.sh</strong><br /><br /><strong>Identify type of Q-score encoding in sequence files</strong><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ testformat.sh in=seq.fq.gz
sanger    fastq    gz    interleaved    150bp</pre></div><p><span>--------------------------------------------------</span><br /><strong>kcompress.sh</strong><br /><br /><span>Newest member of BBTools. Identify constituent k-mers.&nbsp;</span><br /><a href="http://seqanswers.com/forums/showthread.php?t=63258" target="_blank">http://seqanswers.com/forums/showthread.php?t=63258</a><br /><br /><span>----------------------------------------------------</span><br /><strong>commonkmers.sh</strong><br /><br /><span>Find all k-mers for a given sequence.</span></p><div><div>Code:</div><pre dir="ltr">$ commonkmers.sh in=reads.fq out=kmers.txt k=4 count=t display=999</pre></div><p><span>Will produce output that looks like</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">MISEQ05:239:000000000-A74HF:1:2110:14788:23085	ATGA=8	ATGC=6	GTCA=6	AAAT=5	AAGC=5	AATG=5	AGCA=5	ATAA=5	ATTA=5	CAAA=5	CATA=5	CATC=5	CTGC=5	AACC=4	AACG=4	AAGA=4	ACAT=4	ACCA=4	AGAA=4	ATCA=4	ATGG=4	CAAG=4	CCAA=4	CCTC=4	CTCA=4	CTGA=4	CTTC=4	GAGC=4	GGTA=4	GTAA=4	GTTA=4	AAAA=3	AAAC=3	AAGT=3	ACCG=3	ACGG=3	ACTG=3	AGAT=3	AGCT=3	AGGA=3	AGTA=3	AGTC=3	CAGC=3	CATG=3	CGAG=3	CGGA=3	CGTC=3	CTAA=3	CTCC=3	CTTA=3	GAAA=3	GACA=3	GACC=3	GAGA=3	GCAA=3	GGAC=3	TCAA=3	TGCA=3	AAAG=2	AACA=2	AATA=2	AATC=2	ACAA=2	ACCC=2	ACCT=2	ACGA=2	ACGC=2	AGAC=2	AGCG=2	AGGC=2	CAAC=2	CAGG=2	CCGC=2	GCCA=2	GCTA=2	GGAA=2	GGCA=2	TAAA=2	TAGA=2	TCCA=2	TGAA=2	AAGG=1	AATT=1	ACGT=1	AGAG=1	AGCC=1	AGGG=1	ATAC=1	ATAG=1	ATTG=1	CACA=1	CACG=1	CAGA=1	CCAC=1	CCCA=1	CCGA=1	CCTA=1	CGAC=1	CGCA=1	CGCC=1	CGCG=1	CGTA=1	CTAC=1	GAAC=1	GCGA=1	GCGC=1	GTAC=1	GTGA=1	TTAA=1</pre></div><p><span>-----------------------------------------------------</span><br /><strong>Mutate.sh</strong><br /><br /><span>Simulate multiple mutants from a known reference (e.g.&nbsp;</span><em>E. coli</em><span>).</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">$ mutate.sh in=e_coli.fasta out=mutant.fasta id=99 
$ randomreads.sh ref=mutant.fasta out=reads.fq.gz reads=5m length=150 paired adderrors</pre></div><p><span>That will create a mutant version of E.coli with 99% identity to the original, and then generate 5 million simulated read pairs from the new genome. You can repeat this multiple times; each mutant will be different.</span><br /><br /><span>------------------------------------</span><br /><br /><strong>Partition.sh</strong><br /><br /><span>One can partition a large dataset with partition.sh into smaller subsets (example below splits data into 8 chunks).</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">partition.sh in=r1.fq in2=r2.fq out=r1_part%.fq out2=r2_part%.fq ways=8</pre></div><p><span>-----------------------------------</span><br /><strong>clumpify.sh</strong><br /><br /><span>If you are concerned about file size and want the files to be as small as possible, give Clumpify a try. It can reduce filesize by around 30% losslessly by reordering the reads. I've found that this also typically accelerates subsequent analysis pipelines by a similar factor (up to 30%). Usage:</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">clumpify.sh in=reads.fastq.gz out=clumped.fastq.gz</pre></div><div><div>Code:</div><pre dir="ltr">clumpify.sh in1=reads_R1.fastq.gz in2=reads_R2.fastq.gz out1=clumped_R1.fastq.gz out2=clumped_R2.fastq.gz</pre></div><ul>
<li><strong>Clumpify.sh can now mark/remove sequence duplicates (optical/PCR/otherwise) from NGS data</strong></li>
</ul><p><br /><span>This does NOT require alignments so it should prove more useful compared to Picard MarkDuplicates. Relevant options for clumpify.sh command are listed below.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">dedupe=f optical=f (default)
Nothing happens with regards to duplicates.

dedupe=t optical=f
All duplicates are detected, whether optical or not.  All copies except one are removed for each duplicate.

dedupe=f optical=t
Nothing happens.

dedupe=t optical=t

Only optical duplicates (those with an X or Y coordinate within dist) are detected.  All copies except one are removed for each duplicate.
The allduplicates flag makes all copies of duplicates removed, rather than leaving a single copy.  But like optical, it has no effect unless dedupe=t.

Note: If you set "dupedist" to anything greater than 0, "optical" gets enabled automatically.</pre></div><p><span>-------------------------------------</span><br /><strong>fuse.sh</strong><br /><br /><span>Fuse will automatically reverse-complement read 2. Pad (N) amount can be adjusted as necessary. This will for example create a full size amplicon that can be used for alignments.</span><br /><br /></p><div><div>Code:</div><pre dir="ltr">fuse.sh in1=r1.fq in2=r2.fq pad=130 out=fused.fq fusepairs</pre></div>]]></description>
	<dc:creator>Surabhi Chaudhary</dc:creator>
</item>

</channel>
</rss>