BOL: Related items

A5-miseq

Jit — Thu, 18 Aug 2016 04:05:23 -0500

_A5-miseq_ is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. This README will take you through the steps necessary for running _A5-miseq_.

Point to note:

There are many situations where A5-miseq is not the right tool for the job. In order to produce accurate results, A5-miseq requires Illumina data with certain characteristics. A5-miseq will likely not work well with Illumina reads shorter than around 80nt, or reads where the base qualities are low in all or most reads before 60nt. A5-miseq assumes it is assembling homozygous haploid genomes. Use a different assembler for metagenomes and heterozygous diploid or polyploid organisms. Use a different assembler if a tool like FastQC reports your data quality is dubious. You have been warned! Datasets consisting solely of unpaired reads are not currently supported.

Address of the bookmark: https://sourceforge.net/projects/ngopt/

Kaiju

Jit — Mon, 27 Jun 2016 11:23:04 -0500

Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.

By default, Kaiju uses either the available complete genomes from NCBI RefSeq or the microbial subset of the non-redundant protein database nr used by NCBI BLAST, optionally also including fungi and microbial eukaryotes.

Kaiju translates reads into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment. The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a protein database comprising 4821 microbial genomes. Kaiju can also be used for querying any other protein database without taxonomic classification, using either protein or nucleotide queries.

Kaiju is described in Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257 (open access).

Address of the bookmark: http://kaiju.binf.ku.dk/

Scarpa

Poonam Mahapatra — Wed, 13 Jul 2016 07:59:25 -0500

Scarpa is a stand-alone scaffolding tool for NGS data. It can be used together with virtually any genome assembler and any NGS read mapper that supports SAM format. Other features include support for multiple libraries and an option to estimate insert size distributions from data. Scarpa is available free of charge for academic and commercial use under the GNU General Public License (GPL).

See the user manual or the paper for more information about Scarpa. Click here for the supplementary material.

Address of the bookmark: http://compbio.cs.toronto.edu/hapsembler/scarpa.html

CrossMap

Abhimanyu Singh — Mon, 05 Sep 2016 04:07:38 -0500

CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)).
It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.
CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome.
We do not recommend using CrossMap to convert genome coordinates between species.

Address of the bookmark: http://crossmap.sourceforge.net/

LUMPY

Shruti Paniwala — Thu, 25 Aug 2016 08:05:02 -0500

A probabilistic framework for structural variant discovery.

Ryan M Layer, Colby Chiang, Aaron R Quinlan, and Ira M Hall. 2014. "LUMPY: a Probabilistic Framework for Structural Variant Discovery." Genome Biology 15 (6): R84. doi:10.1186/gb-2014-15-6-r84.

More at https://github.com/arq5x/lumpy-sv

Address of the bookmark: https://github.com/arq5x/lumpy-sv

Ka, Ks and Ka/Ks calculations

Poonam Mahapatra — Mon, 29 Aug 2016 11:44:11 -0500

gKaKs is a codon-based genome-level Ka/Ks computation pipeline developed and based on programs from four widely used packages: BLAT, BLASTALL (including bl2seq, formatdb and fastacmd), PAML (including codeml and yn00) and KaKs_Calculator (including 10 substitution rate estimation methods). gKaKs can automatically detect and eliminate frameshift mutations and premature stop codons to compute the substitution rates (Ka, Ks and Ka/Ks) between a well-annotated genome and a non-annotated genome or even a poorly assembled scaffold dataset. It is especially useful for newly sequenced genomes that have not been well annotated.

Look for KaKs calculation:

https://github.com/fumba/kaks-calculator

http://longlab.uchicago.edu/?q=gKaKs

http://www.ncbi.nlm.nih.gov/pubmed/23314322

Address of the bookmark: http://longlab.uchicago.edu/?q=gKaKs

Redundans

Jit — Thu, 01 Sep 2016 08:28:11 -0500

Redundans pipeline assists an assembly of heterozygous genomes.
Program takes as input assembled contigs, paired-end and/or mate pairs sequencing libraries and returns scaffolded homozygous genome assembly, that should be less fragmented and with total size smaller than the input contigs. In addition, Redundans will automatically close the gaps resulting from genome assembly or scaffolding more details.

The pipeline consists of three steps/modules:

redundancy reduction: detection and selectively removal of redundant contigs from an initial de novo assembly
scaffolding: joining of genome fragments using paired-end and/or mate-pairs reads
gap closing

Redundans is:

fast & lightweight, multi-core support and memory-optimised, so it can be run even on the laptop for small-to-medium size genomes
flexible toward many sequencing technologies (Illumina, 454 or Sanger) and library types (paired-end, mate pairs, fosmids)
modular: every step can be ommited or replaced by another tools

Address of the bookmark: https://github.com/Gabaldonlab/redundans

Assembly tutorial PPT

Jit — Wed, 07 Sep 2016 03:12:53 -0500

Saved Cornell University assembly workshop PPT.

Reference:

http://cbsu.tc.cornell.edu/lab/doc/assembly_workshop_20150420_lecture1.pdf

OPERA : Optimal Paired-End Read Assembler

Jit — Fri, 09 Sep 2016 05:28:58 -0500

OPERA (Optimal Paired-End Read Assembler) is a sequence assembly program (http://en.wikipedia.org/wiki/Sequence_assembly). It uses information from paired-end/mate-pair/long reads to order and orient the intermediate contigs/scaffolds assembled in a genome assembly project, in a process known as Scaffolding. OPERA is based on an exact algorithm that is guaranteed to minimize the discordance of scaffolds with the information provided by the paired-end/mate-pair/long reads (for further details see Gao et al, 2011).

Note that since the original publication, we have made significant changes to OPERA (v1.0 onwards) including refinements to its basic algorithm (to reduce local errors, improve efficiency etc.) and incorporated features that are important for scaffolding large genomes (multi-library support, better repeat-handling etc.), in addition to other scalability and usability improvements (bam and gzip support, smaller memory footprint). We therefore encourage you to download and use our latest version: OPERA-LG. In our benchmarks, it has significantly improved corrected N50 and reduced the number of scaffolding errors. Furthermore, our latest release contains the wrapper script OPERA-long-read that enables scaffolding with long-reads from third-generation sequencing technologies (PacBio or Oxford Nanopore). The manuscript describing the new features and algorithms is available at Genome Biology. We look forward to getting your feedback to improve it further.

Address of the bookmark: https://sourceforge.net/p/operasf/wiki/The%20OPERA%20wiki/

BBMap help

Shruti Paniwala — Mon, 10 Oct 2016 06:29:03 -0500

BBMAP • a solution for everything

That content has been reformatted and it is being expanded to include more information.

There are common options for most BBMap suite programs and depending on the file extension the input/output format is automatically chosen/set.

Using BBMap

Mapping Nanopore reads

BBMap.sh has a length cap of 6kbp. Reads longer than this will be broken into 6kbp pieces and mapped independently.

More at https://www.biostarhandbook.com/tools/bbmap/bbmap-help.html

Address of the bookmark: https://www.biostarhandbook.com/tools/bbmap/bbmap-help.html