BOL: Related items

NCBI Prokaryotic Genome Annotation Pipeline

Jit — Tue, 16 May 2017 08:56:03 -0500

NCBI Prokaryotic Genome Annotation Pipeline is designed to annotate bacterial and archaeal genomes (chromosomes and plasmids).

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

NCBI has developed an automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology based methods. The first version of NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP; see Pubmed Article) developed in 2005 has been replaced with an upgraded version that is capable of processing a larger data volume. You can find a more detailed description of the new version of the pipeline in NCBI Handbook chapter. NCBI's annotation pipeline depends on several internal databases and is not currently available for download or use outside of the NCBI environment.

https://www.ncbi.nlm.nih.gov/genome/annotation_prok/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/genome/annotation_prok/

Bacterial genome assembly !!

Jit — Fri, 05 May 2017 06:11:22 -0500

This tutorial will serve as an example of how to use free and open-source genome assembly and secondary scaffolding tools to generate high quality assemblies of bacterial sequence data. The bacterial sample used in this tutorial will be referred to simply as “Species” since it is live data. This data is paired-end data, meaning that there are forward and reverse reads, which we will designate as Sample_R1.fastq and Sample_R2.fastq, respectively.

https://github.com/jennomics/WorkflowPaper/blob/master/Genome%20Assembly%20and%20Annotation.md

Address of the bookmark: http://bioinformatics.uconn.edu/bacterial-genome-assembly-tutorial/

Download assemblies from NCBI

Bulbul — Mon, 15 May 2017 06:02:32 -0500

A new “Download assemblies” button is now available in the Assembly database. This makes it easy to download data for multiple genomes without having to write scripts.

For example, you can run a search in Assembly and use check boxes (see left side of screenshot below) to refine the set of genome assemblies of interest. Then, just open the “Download assemblies” menu, choose the source database (GenBank or RefSeq), choose the file type, and start the download. An archive file will be saved to your computer that can be expanded into a folder containing your selected genome data files.

More at https://ncbiinsights.ncbi.nlm.nih.gov/2017/05/08/genome-data-download-made-easy/

SGA: String Graph Assembler

Jit — Thu, 08 Dec 2016 05:08:59 -0600

SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.

More at

https://github.com/jts/sga

SGA dependencies:
-google sparse hash library (http://code.google.com/p/google-sparsehash/)
-the bamtools library (https://github.com/pezmaster31/bamtools)
-zlib (http://www.zlib.net/)
-(optional but suggested) the jemalloc memory allocator (http://www.canonware.com/jemalloc/download.html)

Address of the bookmark: https://github.com/jts/sga

MeGAMerge: A tool to merge assembled contigs, long reads from metagenomic sequencing runs

Jit — Mon, 19 Dec 2016 09:42:15 -0600

MeGAMerge

MeGAMerge (A tool to merge assembled contigs, long reads from metagenomic sequencing runs)

Description

MeGAMerge is a perl based wrapper/tool that can accept any number of sequence (FASTA) files containing assembled contigs of any length in Multi-FASTA format to produce an improved contig set based on OLC based assembly. All overlap parameters (Minimum Overlap Length, Identity, etc) are user-declarable at runtime. It is written to run on Linux.

Requirements:

You will need to have the following tools installed and in $PATH, or added to $binpath in the tool:

Newbler (specifically runAssembly)
Minimus2 (part of AMOS, also requires MUMmer)

Address of the bookmark: https://github.com/LANL-Bioinformatics/MeGAMerge

bedtools

Jit — Fri, 24 Feb 2017 04:50:44 -0600

Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one tointersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

bedtools is developed in the Quinlan laboratory at the University of Utah and benefits from fantastic contributions made by scientists worldwide.

Address of the bookmark: http://bedtools.readthedocs.io/en/latest/index.html

splitbam: splits a BAM by chromosomes

Jit — Tue, 28 Feb 2017 09:01:28 -0600

splitbam splits a BAM by chromosomes.

Using the reference sequence dictionary (*.dict), it also creates some empty BAM files if no sam record was found for a chromosome. A pair of 'mock' SAM-Records can also be added to those empty BAMs to avoid some tools (like samtools) to crash.

Usage

java -jar splitbam.jar -p OUT/__CHROM__/__CHROM__.bam -R ref.fasta (bam|sam|stdin)

Options

-h help; This screen.
-R (indexed reference file) REQUIRED.
-u (unmapped chromosome name): default:Unmapped
-e | --empty : generate EMPTY bams for chromosome having no read mapped
-m | --mock : if option '-e', add a mock pair of sam records to the empty bam
-p (output file/bam pattern) REQUIRED. MUST contain __CHROM__ and end with .bam
-s assume input is sorted.
-x | --index create index.
-t | --tmp (dir) tmp file directory
-G (file) chrom-group file (see below)

Address of the bookmark: https://code.google.com/archive/p/jvarkit/wikis/SplitBam.wiki

MaxBin: software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm.

Jit — Mon, 06 Mar 2017 04:03:38 -0600

MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users can understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page.

Users can use MEGAN or similar software on MaxBin bins to find the taxonomy of each bin after the binning process is finished.

https://academic.oup.com/bioinformatics/article/32/4/605/1744462/MaxBin-2-0-an-automated-binning-algorithm-to

The most recent version of MaxBin is 2.2, which supports the analysis of coassemblies of multiple samples. It is available at this JBEI downloads sites as well as MaxBin and MaxBin 2.0 sourceforge sites.

Address of the bookmark: http://downloads.jbei.org/data/microbial_communities/MaxBin/MaxBin.html

GroopM: Metagenomic binning toolset

Jit — Tue, 07 Mar 2017 08:59:45 -0600

GroopM is a metagenomic binning toolset. It leverages spatio-temoral
dynamics (differential coverage) to accurately (and almost automatically)
extract population genomes from multi-sample metagenomic datasets.

GroopM is largely parameter-free. Use: groopm -h for more info.

For installation and usage instructions see : http://ecogenomics.github.io/GroopM/

Address of the bookmark: https://github.com/ecogenomics/GroopM

MyPro: A seamless pipeline for automated prokaryotic genome assembly and annotation

Neel — Thu, 15 Dec 2016 05:47:35 -0600

MyPro is an improved genomics software pipeline for prokaryotic genomes. MyPro is user-friendly and requires minimal programming skills. High-quality prokaryotic genome assembly and annotation can be obtained with ease. It performed better than de novo assemblers and contig integration software. Produces more contiguous assemblies, higher N50 values and lower number of contigs.

More at https://sourceforge.net/projects/sb2nhri/files/MyPro/

Address of the bookmark: http://www.sciencedirect.com/science/article/pii/S0167701215001207