BOL: Related items

Nemo – A stochastic, individual-base, genetically explicit simulation platform

Jit — Sat, 01 Oct 2016 14:45:02 -0500

A recombination map has been added for all multi-locus traits. The map positions (chromosomal) for neutral markers (e.g. SNPs) and loci under selection (QTLs, deleterious mutations, DMIs) can now be specified explicitly, or set at random. The map can hold an unlimited number of loci of different types jointly, at any recombination scale (cM or lower). The effects of linkage can thus be finely explored.
A new trait coding for (Bateson-)Dobzhansky-Muller incompatibility loci. Multiple haploid or diploid pairs of incompatible loci can be spread throughout the genome and affect individual fitness.
Multi-type selection: Individual fitness can be jointly determined by different types of loci under selectinon, such as QTLs coding for quantitative traits under spatially variable selection, universally deleterious mutations, and Dobzhansky-Muller incompatibility loci.
An unlimited number of quantitative traits under different forms of selection can be modelled, based on universally pleiotropic loci with several bi- or multi-allelic models.
Spatial and temporal variation of selection on quantitative traits is possible, modelling shifts of environmental conditions over time.
The dispersal matrix describing the movement of individuals among sub-populations can be replaced by a connectivity matrix and a reduced dispersal matrix describing migration only among the connected sub-populations. This offers a substantial gain in computing time and system memory when simulating very large grids.
Input parameters' arguments may be specified in separate files. This is particularly convenient when specifying large matrices.
Many adjustments have been made for refined control of the input of parameters and data output. See updates in the manual.

Address of the bookmark: http://nemo2.sourceforge.net/index.html

VirMet

Jit — Mon, 10 Oct 2016 08:27:19 -0500

Watch out: only a few files are counted in coverage statistics.

Full documentation on Read the Docs.

A set of tools for viral metagenomics.

virmet is called with a command subcommand syntax: virmet fetch --viral n, for example, downloads the bacterial database. Other available subcommands so far are

fetch download genomes
update update viral/bacterial database
index index genomes
wolfpack analyze a Miseq run
covplot plot coverage for a specific organism

A short help is obtained with virmet subcommand -h.

More at https://github.com/ozagordi/VirMet

Address of the bookmark: https://github.com/ozagordi/VirMet

GenomeScope: open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads

Jit — Fri, 21 Oct 2016 05:46:43 -0500

Summary: GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels, and error rates. Availability and Implementation: http://qb.cshl.edu/genomescope/, https://github.com/schatzlab/genomescope.git

Address of the bookmark: http://qb.cshl.edu/genomescope/

RECORD

Bulbul — Fri, 25 Nov 2016 08:23:36 -0600

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

More at https://sourceforge.net/projects/record-genome-assembler/files/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pubmed/26558255

e-RGA: enhanced Reference Guided Assembly of Complex Genomes

Jit — Mon, 19 Dec 2016 05:56:14 -0600

Next Generation Sequencing has totally changed genomics: we are able to produce huge amounts of data at an incredibly low cost compared to Sanger sequencing. Despite this, some old problems have become even more difficult, de novo assembly being on top of this list. Despite efforts to design tools able to assemble, de novo, an organism sequenced with short reads, the results are still far from those achievable with long reads. In this paper, we propose a novel method that aims to improve de novo assembly in the presence of a closely related reference. The idea is to combine de novo and reference-guided assembly in order to obtain enhanced results.

Address of the bookmark: http://journal.embnet.org/index.php/embnetjournal/article/view/208

Understanding Greedy Algorithms

Jit — Mon, 12 Dec 2016 04:37:40 -0600

Learning greedy algo for biologist.

https://www.topcoder.com/community/data-science/data-science-tutorials/greedy-is-good/

This webpage is also useful for the same:

http://learninglover.com/examples.php?id=59

http://www.cs.rpi.edu/~magdon/ps/conference/super_biokdd.pdf

https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/lecture-slides/MIT7_91JS14_Lecture6.pdf

http://schatzlab.cshl.edu/teaching/AssemblyClass/01.%20Assembly%20Intro.pdf

http://lsl.sinica.edu.tw/Services/Class/files/20150612449.pdf

http://www.cs.jhu.edu/~langmea/resources/lecture_notes/assembly_scs.pdf

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-43.pdf

Address of the bookmark: https://www.topcoder.com/community/data-science/data-science-tutorials/greedy-is-good/

MyPro: A seamless pipeline for automated prokaryotic genome assembly and annotation

Neel — Thu, 15 Dec 2016 05:47:35 -0600

MyPro is an improved genomics software pipeline for prokaryotic genomes. MyPro is user-friendly and requires minimal programming skills. High-quality prokaryotic genome assembly and annotation can be obtained with ease. It performed better than de novo assemblers and contig integration software. Produces more contiguous assemblies, higher N50 values and lower number of contigs.

More at https://sourceforge.net/projects/sb2nhri/files/MyPro/

Address of the bookmark: http://www.sciencedirect.com/science/article/pii/S0167701215001207

PEAR

Jit — Mon, 19 Dec 2016 09:28:30 -0600

PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.

PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer.

Address of the bookmark: http://sco.h-its.org/exelixis/web/software/pear/doc.html

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf

MetaBAT: An Efficient Tool for Accurately Reconstructing Single Genomes from Complex Microbial Communities

Jit — Mon, 06 Mar 2017 03:44:34 -0600

MetaBAT, An Efficient Tool for Accurately Reconstructing Single Genomes from Complex Microbial Communities

Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed an automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. Tested on both synthetic and real metagenome datasets, MetaBAT outperforms alternative methods in both accuracy and computational efficiency. Applying MetaBAT to an assembly from 1,704 human gut samples formed 1,634 genome bins (>200kb) in 3 hours, where 621 genome bins are >50% complete with <5% contamination from other species. Further analysis shows that the quality of these genome bins approaches manually curated genomes.

Address of the bookmark: https://bitbucket.org/berkeleylab/metabat