BOL: Related items

MGERT: Mobile Genetic Elements Retrieving Tool

Neel — Sat, 18 May 2019 08:58:01 -0500

MGERT is a computational pipeline for easy retrieving of MGE's coding sequences of a particular family from genome assemblies. MGERT utilizes several established bioinformatic tools combined into single pipeline which hides different technical quirks from an inexperienced user.

Address of the bookmark: https://github.com/andrewgull/MGERT

DeepVariant : an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Jit — Sat, 25 Jan 2020 13:28:09 -0600

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework.

https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html

https://www.biorxiv.org/content/10.1101/092890v6

Address of the bookmark: https://github.com/google/deepvariant

geNomad: Identification of mobile genetic elements

LEGE — Sun, 31 Aug 2025 06:40:17 -0500

geNomad is a tool that identifies virus and plasmid genomes from nucleotide sequences. It provides state-of-the-art classification performance and can be used to quickly find mobile genetic elements from genomes, metagenomes, or metatranscriptomes.

Address of the bookmark: https://portal.nersc.gov/genomad

Musket - a multistage k-mer spectrum based corrector

BioStar — Tue, 04 Feb 2020 23:24:57 -0600

Musket is a well-established leading next-generation sequencing read error correction algorithm targetting Illumina sequencing. This corrector employs the k-mer spectrum approach and introduces three correction techniques in a multistage workflow. Our performance evaluation results, in terms of correction quality and de novo genome assembly measures, reveal that Musket is consistently one of the top performing substitution-error-based correctors.

Address of the bookmark: http://musket.sourceforge.net/homepage.htm

Tallymer: method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

Jit — Thu, 15 Feb 2018 10:21:02 -0600

Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set whole genome shotgun sequences from maize (B73) (total size 10⁹ bp).
Tallymer was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available.

A manual can be found here.

Address of the bookmark: https://www.zbh.uni-hamburg.de/forschung/arbeitsgruppe-genominformatik/software/tallymer.html

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

Circoletto: visualizing sequence similarity with Circos

Jit — Fri, 09 Feb 2018 10:23:40 -0600

Circoletto, an online visualization tool based on Circos, which provides a fast, aesthetically pleasing and informative overview of sequence similarity search results.

Online version and downloadable software package for offline use (source code in PERL) freely available at http://bat.ina.certh.gr/tools/circoletto/

Contact:ndarz@certh.gr

Address of the bookmark: http://tools.bat.infspire.org/circoletto/