BOL: Related items

Nemo – A stochastic, individual-base, genetically explicit simulation platform

Jit — Sat, 01 Oct 2016 14:45:02 -0500

A recombination map has been added for all multi-locus traits. The map positions (chromosomal) for neutral markers (e.g. SNPs) and loci under selection (QTLs, deleterious mutations, DMIs) can now be specified explicitly, or set at random. The map can hold an unlimited number of loci of different types jointly, at any recombination scale (cM or lower). The effects of linkage can thus be finely explored.
A new trait coding for (Bateson-)Dobzhansky-Muller incompatibility loci. Multiple haploid or diploid pairs of incompatible loci can be spread throughout the genome and affect individual fitness.
Multi-type selection: Individual fitness can be jointly determined by different types of loci under selectinon, such as QTLs coding for quantitative traits under spatially variable selection, universally deleterious mutations, and Dobzhansky-Muller incompatibility loci.
An unlimited number of quantitative traits under different forms of selection can be modelled, based on universally pleiotropic loci with several bi- or multi-allelic models.
Spatial and temporal variation of selection on quantitative traits is possible, modelling shifts of environmental conditions over time.
The dispersal matrix describing the movement of individuals among sub-populations can be replaced by a connectivity matrix and a reduced dispersal matrix describing migration only among the connected sub-populations. This offers a substantial gain in computing time and system memory when simulating very large grids.
Input parameters' arguments may be specified in separate files. This is particularly convenient when specifying large matrices.
Many adjustments have been made for refined control of the input of parameters and data output. See updates in the manual.

Address of the bookmark: http://nemo2.sourceforge.net/index.html

Structural variation: the hidden genomic treasure

Jit — Sat, 10 Dec 2016 16:19:09 -0600

Genome re-sequencing projects have revealed substantial amounts of genetic variation between individuals extending beyond single nucleotide polymorphisms (SNPs) and short indels. Structural Variations (SVs) and Copy Number Variations (CNVs) are a major source of genomic variation. However, compared to SNPs, accurate detection, genotyping and understanding of CNVs is lagging behind due to much greater analytical challenges related to SV/CNV detection and analysis. In our lab we analyse SVs/CNVs using high-throughput sequencing and different analytical approaches. The most‐studied structural variants are copy number variations (CNVs) which can be generated by several different mechanisms including non‐allelic homologous recombination, non‐homologous end‐joining and deoxyribonucleic acid (DNA) replication‐related fork stalling and template switching. CNVs are closely related to segmental duplications (SDs): SDs can stimulate the formation of CNVs and themselves started out as CNVs, but became fixed in a species. Structural variation can be neutral but has also influenced our phenotypic evolution, for example our susceptibility to disease and our ability to digest certain types of food. Our understanding of the extent of structural variation is increasing rapidly, but it will be much more difficult to understand its phenotypic consequences.

Structural variants (SVs) such as deletions, insertions, duplications, inversions and translocations litter genomes and are often associated with gene expression changes and severe phenotypes (ie. genetic diseases in humans). Recent studies on the functional aspects of different types of SVs have unveiled several cases of adaptive evolution. For example, inversions have been associated with ecological adaptations and may facilitate speciation. Due to their prevalent nature, SVs arguably have a large impact on genome evolution and should not be neglected when studying the genetics of adaptation and speciation. SVs were classically defined as chromosomal rearrangements larger than 1kb, but due to a higher resolution of new detection methods, smaller variants (between 50 and 1000 base pairs) can now be accurately assessed. Besides various methods of detection in next generation sequencing data (paired end mapping, split reads, and depth of coverage), array-based approaches have proven to be particularly useful for detecting copy number variations (CNVs). These technologies have enabled researchers to catalog a wide spectrum of SVs in many organisms and infer the effects of selection shaping their evolutionary trajectories.

Structure variation sequencing signature (Source: NatRev Genetics)

Related tools, databases and publications are listed below. If you know any interesing papers, please let us know in comment section:

Key concepts

Structural variation includes balanced variants such as inversions and translocations, and unbalanced ones such as duplications and deletions (copy number variations or CNVs).

Structural variants can arise by several mechanisms, including nonallelic homologous recombination (NAHR), nonhomologous end‐joining (NHEJ) and DNA replication‐based fork stalling and template switching (FoSTeS).

CNV is closely linked to segmental duplication, but is not exactly the same. Segmental duplications can stimulate CNV formation by NAHR, and themselves arise from CNVs that have become fixed.

Segmental duplications did not appear uniformly during the evolution of the Great Ape species, but rather during a burst of activity around the time of the divergence of gorilla from the human/chimpanzee ancestor.

Duplicated genes play a critical role in the evolution of a genome as they act as ‘spare parts’ than can evolve to perform new or more specialized functions.

Effects of structural variation on gene expression can be identified but only a few examples of the consequences for species biology have been documented.

Tools

CNVnatora tool for CNV discovery and genotyping from depth of read mapping.2011a,2011b

AGEa tools that implements an algorithm for optimal alignment of sequences with SVs.2011

BreakSeqa pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.2010

PEMera computational and simulation framework for discovering SVs by paired-end read mapping.2009,2007

GASV https://code.google.com/archive/p/gasv/

PAIROSCOPE http://pairoscope.sourceforge.net/

SVDetect http://svdetect.sourceforge.net/Site/Home.html

BreakPtr, discovery of unbalanced structural variants (copy-number variants) with tiling microarrays Link

R Package https://www.bioconductor.org/help/course-materials/2010/EMBL2010/Practical-4-StructuralVariants.pdf

BreakSeq, structural variant genotyping using split reads Link

CopySeq, genotyping of unbalanced structural variants (copy-number variants) using read-depth Link

DELLY2, integrated structural variant discovery, genotyping and visualization in deep sequencing data Link

PEMer, structural variant discovery in 454 sequencing data by paired-end mapping Link

TIGER, transduction inference in germline genomes using short read data Link

MANTA https://github.com/Illumina/manta

SV-Bay https://github.com/InstitutCurie/SV-Bay

BreakDancer http://breakdancer.sourceforge.net/

Variation Hunter http://compbio.cs.sfu.ca/software-variation-hunter

Lumpy https://github.com/arq5x/lumpy-sv

ForestSV http://sebatlab.ucsd.edu/index.php/software-data

PBSuites for long reads https://sourceforge.net/projects/pb-jelly/

Visualization

The SV visualization tool: http://genomesavant.com/savant/

InGAP-SV (http://ingap.sourceforge.net/) that is nice tools for both detection and visualisation of severals kind of structural variations (Large insertions, translocation, deletion, inversions....)

Tools table: http://www.nature.com/nbt/journal/v29/n8/fig_tab/nbt.1904_T2.html

Variation Viewer https://www.ncbi.nlm.nih.gov/variation/view/

Papers

http://www.nature.com/nmeth/journal/v9/n2/full/nmeth.1858.html

http://journal.frontiersin.org/researchtopic/1412/structural-variations-in-genomes-ecological-and-evolutionary-implications

http://www.mi.fu-berlin.de/wiki/pub/ABI/GenomicsLecture10Materials/structural-variation.pdf

http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1479-3

https://www.ncbi.nlm.nih.gov/dbvar/content/overview/

http://www.nature.com/subjects/structural-variation

https://eichlerlab.gs.washington.edu/news/NatMeth_Feb2012.pdf

https://www.ncbi.nlm.nih.gov/pubmed/19477992 ***

https://www.ncbi.nlm.nih.gov/pubmed/22452995

http://biorxiv.org/content/early/2016/09/06/073833

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479793/

http://www.nature.com/articles/srep18501

http://www.genetics.org/content/202/1/351

http://www.cs.cmu.edu/~sssykim/teaching/s13/slides/Lecture_SVI.pdf

https://www.omicsonline.org/open-access/structural-variation-detection-from-next-generation-sequencing-2469-9853-S1-007.php?aid=69055

http://schatzlab.cshl.edu/presentations/2016/2016.01.12.PAG.Structural%20Variations.pdf

sockeye

Jit — Fri, 17 Feb 2017 08:51:16 -0600

This sockeye software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization.

Address of the bookmark: http://www.bcgsc.ca/platform/bioinfo/software/sockeye/releases/1.3

Jvarkit : Java utilities for Bioinformatics

Jit — Fri, 08 Jun 2018 09:31:55 -0500

Collection of Java tool kits for bioinformatics works: Jvarkit : Java utilities for Bioinformatics

Address of the bookmark: http://lindenb.github.io/jvarkit/

J-Circos

Shruti Paniwala — Fri, 17 Feb 2017 09:06:54 -0600

Circos plot tool (J-Circos) that is an interactive visualization tool that can plot Circos figures, as well as being able to dynamically add data to the figure, and providing information for specific data points using mouse hover display and zoom in/out functions. J-Circos uses the Java computer language to enable it to be used on most operating systems (Windows, MacOS, Linux). Users can input data into J-Circos using flat data formats, as well as from the GUI. J-Circos will enable biologists to better study more complex chromosomal interactions and fusion transcripts that are otherwise difficult to visualize from next-generation sequencing data.

Address of the bookmark: http://www.australianprostatecentre.org/research/software/jcircos

Pacbio Long Reads Compatible Software and Tools

Archana Malhotra — Wed, 15 Mar 2017 14:19:01 -0500

The following software packages are known to be compatible with PacBio® data, in addition to PacBio's own SMRT® Analysis suite. All packages are believed to be open source or freely available for non-commercial use. See the individual project sites for up-to-date license information. A separate page lists commercial software.

Know of any other open source software for PacBio data? Email us.

Software categories:

Address of the bookmark: https://github.com/PacificBiosciences/DevNet/wiki/Compatible-Software

A 3D Map of the Human Genome

Fri, 12 Dec 2014 22:27:55 -0600

Suhas Rao and Miriam Huntley (of the Aiden Lab) describe a 3D map of the human genome at kilobase resolution, revealing the principles of chromatin looping. Guest Origami Folding: Sarah Nyquist. Suhas S.P. Rao*, Miriam H. Huntley*, Neva C. Durand, Elena K. Stamenova, Ivan D. Bochkov, James T. Robinson, Adrian L. Sanborn, Ido Machol, Arina D. Omer, Eric S. Lander, Erez Lieberman Aiden. (2014). A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell.

GeneBreak: a tool to systematically identify genes recurrently affected by the genomic location of chromosomal CNA-associated breaks by a genome-wide approach

Jit — Sat, 01 Oct 2016 15:15:29 -0500

Development of cancer is driven by somatic alterations, including numerical and structural chromosomal aberrations. Currently, several computational methods are available and are widely applied to detect numerical copy number aberrations (CNAs) of chromosomal segments in tumor genomes. However, there is lack of computational methods that systematically detect structural chromosomal aberrations by virtue of the genomic location of CNA-associated chromosomal breaks and identify genes that appear non-randomly affected by chromosomal breakpoints across (large) series of tumor samples. ‘GeneBreak’ is developed to systematically identify genes recurrently affected by the genomic location of chromosomal CNA-associated breaks by a genome-wide approach, which can be applied to DNA copy number data obtained by array-Comparative Genomic Hybridization (CGH) or by (low-pass) whole genome sequencing (WGS). First, ‘GeneBreak’ collects the genomic locations of chromosomal CNA-associated breaks that were previously pinpointed by the segmentation algorithm that was applied to obtain CNA profiles. Next, a tailored annotation approach for breakpoint-to-gene mapping is implemented. Finally, dedicated cohort-based statistics is incorporated with correction for covariates that influence the probability to be a breakpoint gene. In addition, multiple testing correction is integrated to reveal recurrent breakpoint events. This easy-to-use algorithm, ‘GeneBreak’, is implemented in R (www.cran.r-project.org) and is available from Bioconductor (www.bioconductor.org/packages/release/bioc/html/GeneBreak.html).

Address of the bookmark: http://www.bioconductor.org/packages/release/bioc/html/GeneBreak.html

LAST

Bulbul — Mon, 19 Dec 2016 14:07:53 -0600

LAST can:

Handle big sequence data, e.g:
- Compare two vertebrate genomes
- Align billions of DNA reads to a genome
Indicate the reliability of each aligned column.
Use sequence quality data properly.
Compare DNA to proteins, with frameshifts.
Compare PSSMs to sequences
Calculate the likelihood of chance similarities between random sequences.
Do split and spliced alignment.
Train alignment parameters for unusual kinds of sequence (e.g. nanopore).

Address of the bookmark: http://last.cbrc.jp/

pyScaf

Bulbul — Mon, 19 Dec 2016 14:20:33 -0600

pyScaf orders contigs from genome assemblies utilising several types of information:

paired-end (PE) and/or mate-pair libraries (NGS-based mode)
long reads (NGS-based mode)
synteny to the genome of some related species (reference-based mode)

Scaffolding

In reference-based mode, pyScaf uses synteny to the genome of closely related species in order to order contigs and estimate distances between adjacent contigs.

Contigs are aligned globally (end-to-end) onto reference chromosomes, ignoring:

matches not satisfying cut-offs (--identity and --overlap)
suboptimal matches (only best match of each query to reference is kept)
and removing overlapping matches on reference.

In preliminary tests, pyScaf performed superbly on simulated heterozygous genomes based on C. parapsilosis (13 Mb; CANPA) and A. thaliana (119 Mb; ARATH) chromosomes, reconstructing correctly all chromosomes always for CANPA and nearly always for ARATH (Figures in dropbox, CANPA table, ARATH table).
Runs took ~0.5 min for CANPA on 4 CPUs and ~2 min for ARATH on 16 CPUs.

Important remarks:

Reduce your assembly before (fasta2homozygous.py) as any redundancy will likely break the synteny.
pyScaf works better with contigs than scaffolds, as scaffolds are often affected by mis-assemblies (no de novo assembler / scaffolder is perfect...), which breaks synteny.
pyScaf works very well if divergence between reference genome and assembled contigs is below 20% at nucleotide level.
pyScaf deals with large rearrangements ie. deletions, insertion, inversions, translocations. Note however, this is experimental implementation!
Consider closing gaps after scaffolding.

Address of the bookmark: https://github.com/lpryszcz/pyScaf