BOL: Related items

LUMPY

Shruti Paniwala — Thu, 25 Aug 2016 08:05:02 -0500

A probabilistic framework for structural variant discovery.

Ryan M Layer, Colby Chiang, Aaron R Quinlan, and Ira M Hall. 2014. "LUMPY: a Probabilistic Framework for Structural Variant Discovery." Genome Biology 15 (6): R84. doi:10.1186/gb-2014-15-6-r84.

More at https://github.com/arq5x/lumpy-sv

Address of the bookmark: https://github.com/arq5x/lumpy-sv

Ka, Ks and Ka/Ks calculations

Poonam Mahapatra — Mon, 29 Aug 2016 11:44:11 -0500

gKaKs is a codon-based genome-level Ka/Ks computation pipeline developed and based on programs from four widely used packages: BLAT, BLASTALL (including bl2seq, formatdb and fastacmd), PAML (including codeml and yn00) and KaKs_Calculator (including 10 substitution rate estimation methods). gKaKs can automatically detect and eliminate frameshift mutations and premature stop codons to compute the substitution rates (Ka, Ks and Ka/Ks) between a well-annotated genome and a non-annotated genome or even a poorly assembled scaffold dataset. It is especially useful for newly sequenced genomes that have not been well annotated.

Look for KaKs calculation:

https://github.com/fumba/kaks-calculator

http://longlab.uchicago.edu/?q=gKaKs

http://www.ncbi.nlm.nih.gov/pubmed/23314322

Address of the bookmark: http://longlab.uchicago.edu/?q=gKaKs

R-chie

Jit — Thu, 01 Sep 2016 11:47:24 -0500

R-chie allows you to make arc diagrams of RNA secondary structures, allowing for easy comparison and overlap of two structures, rank and display basepairs in colour and to also visualize corresponding multiple sequence alignments and co-variation information.
R4RNA is the R package powering R-chie, available for download and local use for more customized figures and scripting.

http://www.e-rna.org/r-chie/plot.cgi?eg=single

Address of the bookmark: http://www.e-rna.org/r-chie/plot.cgi?eg=single

Structural variants PPT

Jit — Wed, 07 Sep 2016 03:16:09 -0500

1000 Genomes data tutorial at ASHG

Structural variants presentation by

Jan Korbel

European Molecular Biology Laboratory (EMBL) Heidelberg Genome Biology Research Unit

Reference:

https://www.genome.gov/pages/research/der/1000genomesprojecttutorials/structuralvariants-jankorbel.pdf

CGView - Circular Genome Viewer

Jit — Mon, 19 Sep 2016 07:52:26 -0500

GView is a Java package used to display and navigate bacterial genomes. GView is useful for producing high-quality genome maps for use in publications and websites, or as a visualization tool in a sequence annotation pipeline. Users can interact with the genome using a powerful pan-and-zoom interface, or GView can write static images of a genome to a file. GView can draw a genome using either circular or linear layouts. For examples of some of the images GView can produce, see the Image Gallery. GView is a re-write of CGView, a circular genome viewer written by Paul Stothard. The goal of GView is to provide greater user interaction, and more flexibility in how the genome map is rendered. To aid with easily configuring the display of a genome, a style editor has been included to provide an intuitive, user-friendly graphical user interface for customizing genome maps. Styling attributes such as colours or fonts for the various map elements can be adjusted in real time. Customized styles can be saved for later use or for application to other genome maps using GView's custom file format.

Address of the bookmark: http://wishart.biology.ualberta.ca/cgview/

Murasaki

Anjana — Fri, 30 Sep 2016 10:22:30 -0500

Murasaki is an anchor alignment program that is

exteremely fast (17 CPU hours for whole Human x Mouse genome (with 40 nodes: 35 wall minutes), or 8 mammals in 21 CPU hours (42 wall minutes))
scalable (Arbitrarily parallelizable across multiple nodes using MPI)
memory efficient. (Even a single node with 16GB of ram can handle over 1Gbp of sequence)
unlimited by pattern length or selection
repeat tolerant

Address of the bookmark: http://murasaki.dna.bio.keio.ac.jp/wiki/index.php?Murasaki

VirMet

Jit — Mon, 10 Oct 2016 08:27:19 -0500

Watch out: only a few files are counted in coverage statistics.

Full documentation on Read the Docs.

A set of tools for viral metagenomics.

virmet is called with a command subcommand syntax: virmet fetch --viral n, for example, downloads the bacterial database. Other available subcommands so far are

fetch download genomes
update update viral/bacterial database
index index genomes
wolfpack analyze a Miseq run
covplot plot coverage for a specific organism

A short help is obtained with virmet subcommand -h.

More at https://github.com/ozagordi/VirMet

Address of the bookmark: https://github.com/ozagordi/VirMet

GenomeScope: open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads

Jit — Fri, 21 Oct 2016 05:46:43 -0500

Summary: GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels, and error rates. Availability and Implementation: http://qb.cshl.edu/genomescope/, https://github.com/schatzlab/genomescope.git

Address of the bookmark: http://qb.cshl.edu/genomescope/

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

Mercator

Jit — Mon, 06 Feb 2017 04:20:36 -0600

Our basic strategy in building homology maps is to use exons that are orthologous in multiple genomes as map "anchors." Given K genomes, the steps in the map construction are as follows:

For each genome, obtain a set of exon annotations. These annotations can be a combination of both exon predictions (e.g. Genscan) and annotations that have been experimentally verified (e.g. RefSeq). Ideally, we would like to have these annotations be as sensitive as possible. Specificity is not a concern, as incorrect annotations are not likely not have significant alignments with other gene annotations.
Compare all exons against all exons in other genomes and record significant alignments between exons. Currently, we use BLAT to do this all-vs-all comparison with alignments being performed in protein space.
Construct a graph with each vertex corresponding to a exon and edges between vertices whose corresponding exons have significant alignments.
Identify cliques in this graph. These cliques are potential anchors to be used in the map.
Starting with the largest cliques (those that have exons in all or most of the genomes), join neighboring (adjacent in genomic coordinates, in each genome) cliques to form runs. Smaller cliques that are inconsistent with runs formed by larger cliques are filtered out. After the smallest cliques have been considered, cliques that are not part of a run are discarded.
The extents of each run in each genome are outputted as orthologous segments. The cliques from each run are used to output the exact genomic coordinates of anchors within each orthologous segment. These anchors can be used by genomic alignment programs (such as MAVID) to do a detailed alignment of each orthologous segment.

https://www.biostat.wisc.edu/~cdewey/mercator/

Address of the bookmark: https://www.biostat.wisc.edu/~cdewey/mercator/