BOL: Related items

OrganellarGenomeDRAW

Jit — Tue, 16 Aug 2016 08:13:13 -0500

OrganellarGenomeDRAW is dedicated to convert genetic information stored in GenBank entries to graphical maps. The input text file has to be in GenBank flat file format, whereas the output format can be chosen among several formats. The application is especially optimized and adapted for the creation of high-quality, detailed circular maps of organellar genomes like the plastid genome (plastome) or the mitochondrial genome (chondriome). Nevertheless, you can upload any GenBank entry. The workflow is devided into three steps.

More at http://ogdraw.mpimp-golm.mpg.de/cgi-bin/ogdraw.pl

Address of the bookmark: http://ogdraw.mpimp-golm.mpg.de/index.shtml

String graph based genome assembly software and tools !

Rahul Nayak — Tue, 19 Dec 2017 17:17:38 -0600

In graph theory, a string graph is an intersection graph of curves in the plane; each curve is called a "string". String graphs were first proposed by E. W. Myers in a 2005 publication. In recent Genome Research paper describing an innovative approach for assembling large genomes from NGS data caught our attention for several reasons. i) it give different "string graph" prospective of long lasting genome assembly problem ii) the paper is coauthored by Jared Simpson, the developer of ABySS assembler and Richard Durbin. iii) Simpson-Durbin algorithm is that it does not rely on de Bruijn graphs, and instead employs a different graph construction approach called ‘string graph’.

Following are the genome assembly tools based on string graph:

1.SGA (String Graph Assembler) https://github.com/jts/sga

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.

2. SAGE: String-overlap Assembly of GEnomes https://github.com/lucian-ilie/SAGE2

SAGE, for de novo genome assembly. As opposed to most assemblers, which are de Bruijn graph based, SAGE uses the string-overlap graph. SAGE builds upon great existing work on string-overlap graph and maximum likelihood assembly, bringing an important number of new ideas, such as the efficient computation of the transitive reduction of the string overlap graph, the use of (generalized) edge multiplicity statistics for more accurate estimation of read copy counts, and the improved use of mate pairs and min-cost flow for supporting edge merging. The assemblies produced by SAGE for several short and medium-size genomes compared favourably with those of existing leading assemblers.

3. FSG: Fast String Graph

The new integrated assembler has been assessed on a standard benchmark, showing that fast string graph (FSG) is significantly faster than SGA while maintaining a moderate use of main memory, and showing practical advantages in running FSG on multiple threads. Moreover, we have studied the effect of coverage rates on the running times.

4. BASE https://github.com/dhlbh/BASE

It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs. BASE is a practically efficient tool for constructing contig, with significant improvement in quality for long NGS reads. It is relatively easy to extend BASE to include scaffolding.

5. Fermi https://github.com/lh3/fermi/

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of unitigs to represent all the information in raw reads.

If you want to learn about String Graph assembler, please read the following papers -

i) The Fragment Assembly String Graph - E. W. Myers

This paper describes the String Graph concept.

ii) Efficient construction of an assembly string graph using the FM-index - Jared T. Simpson and Richard Durbin

This earlier paper from Simpson and Durbin

iii) Efficient de novo assembly of large genomes using compressed data structures - Jared T. Simpson and Richard Durbin

TEannot

Jit — Thu, 18 Aug 2016 10:02:03 -0500

We advise to run first the TEdenovo pipeline but it is not compulsory. We suppose you begin by running the TEannot pipeline on the example provided in the directory "db/" rather than directly on your own genomic sequences. Thus, from now on, the project name is "DmelChr4".

Address of the bookmark: https://urgi.versailles.inra.fr/Tools/REPET/TEannot-tuto

Genome assembly stats plotting

Jit — Wed, 28 Feb 2018 03:45:39 -0600

A de novo genome assembly can be summarised b

y a number of metrics, including:

Overall assembly length
Number of scaffolds/contigs
Length of longest scaffold/contig
Scaffold/contig N50 and N90Assembly base composition, in particular percentage GC and percentage Ns
CEGMA completeness
Scaffold/contig length/count distribution

assembly-stats supports two widely used presentations of these values, tabular and cumulative length plots, and introduces an additional circular plot that summarises most commonly used assembly metrics in a single visualisation. Each of these presentations is generated using javascript from a common (JSON) data structure, allowing toggling between alternative views, and each can be applied to a single or multiple assemblies to allow direct comparison of alternate assemblies.

Tabular presentation allows direct comparison of exact values between assemblies, the limitations of this approach lie in the necessary omission of distributions and the challenge of interpreting ratios of values that may vary by several orders of magnitude.

Address of the bookmark: https://github.com/rjchallis/assembly-stats

GeneMANIA

Jit — Mon, 22 Aug 2016 09:55:16 -0500

Faster, more accurate algorithms function prediction "GeneMANIA (Multiple Association Network Integration Algorithm)" have however been developed in recent years and are publicly available on the web, indicating the future direction of function prediction.

Address of the bookmark: http://genemania.org/

GeneValidator - Identify problems with predicted genes

Poonam Mahapatra — Fri, 26 Aug 2016 06:00:03 -0500

GeneValidator helps in identifing problems with gene predictions and provide useful information extracted from analysing orthologs in BLAST databases. The results produced can be used by biocurators and researchers who need accurate gene predictions.

If you would like to use GeneValidator on a few sequences, see our online GeneValidator Web App -http://genevalidator.sbcs.qmul.ac.uk.

If you use GeneValidator in your work, please cite us as follows:

Dragan M‡, Moghul MI‡, Priyam A, Bustos C & Wurm Y. 2016. GeneValidator: identify problems with protein-coding gene predictions. Bioinformatics, doi: 10.1093/bioinformatics/btw015.

Address of the bookmark: https://github.com/wurmlab/genevalidator

assemblytics: delta file to analyze alignments of an assembly to another assembly or a reference genome

Jit — Thu, 14 Jun 2018 07:31:00 -0500

Download and install MUMmer Align your assembly to a reference genome using nucmer (from MUMmer package) $ nucmer -maxmatch -l 100 -c 500 REFERENCE.fa ASSEMBLY.fa -prefix OUT Consult the MUMmer manual if you encounter problems Optional: Gzip the delta file to speed up upload (usually 2-4X faster) $ gzip OUT.delta Then use the OUT.delta.gz file for upload. Upload the .delta or delta.gz file (view example) to Assemblytics Important: Use only contigs rather than scaffolds from the assembly. This will prevent false positives when the number of Ns in the scaffolded sequence does not match perfectly to the distance in the reference. The unique sequence length required represents an anchor for determining if a sequence is unique enough to safely call variants from, which is an alternative to the mapping quality filter for read alignment. http://assemblytics.com/

Address of the bookmark: http://assemblytics.com/

R-chie

Jit — Thu, 01 Sep 2016 11:47:24 -0500

R-chie allows you to make arc diagrams of RNA secondary structures, allowing for easy comparison and overlap of two structures, rank and display basepairs in colour and to also visualize corresponding multiple sequence alignments and co-variation information.
R4RNA is the R package powering R-chie, available for download and local use for more customized figures and scripting.

http://www.e-rna.org/r-chie/plot.cgi?eg=single

Address of the bookmark: http://www.e-rna.org/r-chie/plot.cgi?eg=single

HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly

BioStar — Thu, 27 Sep 2018 07:08:47 -0500

HM2 can process any diploid assemblies, but it is especially suitable for diploid assemblies with high heterozygosity (≥3%), which can be difficult for other tools. This pipeline also implements flexible and sensitive assembly error detection, a hierarchical scaffolding procedure and a reliable gap-closing method for haploid sub-assemblies.

Source code, executables and the testing dataset are freely available at https://github.com/mapleforest/HaploMerger2/releases/.

Address of the bookmark: https://github.com/mapleforest/HaploMerger2/releases/

Assembly tutorial PPT

Jit — Wed, 07 Sep 2016 03:12:53 -0500

Saved Cornell University assembly workshop PPT.

Reference:

http://cbsu.tc.cornell.edu/lab/doc/assembly_workshop_20150420_lecture1.pdf