Genome re-sequencing projects have revealed substantial amounts of genetic variation between individuals extending beyond single nucleotide polymorphisms (SNPs) and short indels. Structural Variations (SVs) and Copy Number Variations (CNVs) are a major source of genomic variation. However, compared to SNPs, accurate detection, genotyping and understanding of CNVs is lagging behind due to much greater analytical challenges related to SV/CNV detection and analysis. In our lab we analyse SVs/CNVs using high-throughput sequencing and different analytical approaches. The most‐studied structural variants are copy number variations (CNVs) which can be generated by several different mechanisms including non‐allelic homologous recombination, non‐homologous end‐joining and deoxyribonucleic acid (DNA) replication‐related fork stalling and template switching. CNVs are closely related to segmental duplications (SDs): SDs can stimulate the formation of CNVs and themselves started out as CNVs, but became fixed in a species. Structural variation can be neutral but has also influenced our phenotypic evolution, for example our susceptibility to disease and our ability to digest certain types of food. Our understanding of the extent of structural variation is increasing rapidly, but it will be much more difficult to understand its phenotypic consequences.
Structural variants (SVs) such as deletions, insertions, duplications, inversions and translocations litter genomes and are often associated with gene expression changes and severe phenotypes (ie. genetic diseases in humans). Recent studies on the functional aspects of different types of SVs have unveiled several cases of adaptive evolution. For example, inversions have been associated with ecological adaptations and may facilitate speciation. Due to their prevalent nature, SVs arguably have a large impact on genome evolution and should not be neglected when studying the genetics of adaptation and speciation. SVs were classically defined as chromosomal rearrangements larger than 1kb, but due to a higher resolution of new detection methods, smaller variants (between 50 and 1000 base pairs) can now be accurately assessed. Besides various methods of detection in next generation sequencing data (paired end mapping, split reads, and depth of coverage), array-based approaches have proven to be particularly useful for detecting copy number variations (CNVs). These technologies have enabled researchers to catalog a wide spectrum of SVs in many organisms and infer the effects of selection shaping their evolutionary trajectories.
Structure variation sequencing signature (Source: NatRev Genetics)
Related tools, databases and publications are listed below. If you know any interesing papers, please let us know in comment section:
Key concepts
Structural variation includes balanced variants such as inversions and translocations, and unbalanced ones such as duplications and deletions (copy number variations or CNVs).
Structural variants can arise by several mechanisms, including nonallelic homologous recombination (NAHR), nonhomologous end‐joining (NHEJ) and DNA replication‐based fork stalling and template switching (FoSTeS).
CNV is closely linked to segmental duplication, but is not exactly the same. Segmental duplications can stimulate CNV formation by NAHR, and themselves arise from CNVs that have become fixed.
Segmental duplications did not appear uniformly during the evolution of the Great Ape species, but rather during a burst of activity around the time of the divergence of gorilla from the human/chimpanzee ancestor.
Duplicated genes play a critical role in the evolution of a genome as they act as ‘spare parts’ than can evolve to perform new or more specialized functions.
Effects of structural variation on gene expression can be identified but only a few examples of the consequences for species biology have been documented.
Tools
CNVnatora tool for CNV discovery and genotyping from depth of read mapping.2011a,2011b
AGEa tools that implements an algorithm for optimal alignment of sequences with SVs.2011
BreakSeqa pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.2010
PEMera computational and simulation framework for discovering SVs by paired-end read mapping.2009,2007
GASV https://code.google.com/archive/p/gasv/
PAIROSCOPE http://pairoscope.sourceforge.net/
SVDetect http://svdetect.sourceforge.net/Site/Home.html
BreakPtr, discovery of unbalanced structural variants (copy-number variants) with tiling microarrays Link
R Package https://www.bioconductor.org/help/course-materials/2010/EMBL2010/Practical-4-StructuralVariants.pdf
BreakSeq, structural variant genotyping using split reads Link
CopySeq, genotyping of unbalanced structural variants (copy-number variants) using read-depth Link
DELLY2, integrated structural variant discovery, genotyping and visualization in deep sequencing data Link
PEMer, structural variant discovery in 454 sequencing data by paired-end mapping Link
TIGER, transduction inference in germline genomes using short read data Link
MANTA https://github.com/Illumina/manta
SV-Bay https://github.com/InstitutCurie/SV-Bay
BreakDancer http://breakdancer.sourceforge.net/
Variation Hunter http://compbio.cs.sfu.ca/software-variation-hunter
Lumpy https://github.com/arq5x/lumpy-sv
ForestSV http://sebatlab.ucsd.edu/index.php/software-data
PBSuites for long reads https://sourceforge.net/projects/pb-jelly/
Visualization
The SV visualization tool: http://genomesavant.com/savant/
InGAP-SV (http://ingap.sourceforge.net/) that is nice tools for both detection and visualisation of severals kind of structural variations (Large insertions, translocation, deletion, inversions....)
Tools table: http://www.nature.com/nbt/journal/v29/n8/fig_tab/nbt.1904_T2.html
Variation Viewer https://www.ncbi.nlm.nih.gov/variation/view/
Papers
http://www.nature.com/nmeth/journal/v9/n2/full/nmeth.1858.html
http://www.mi.fu-berlin.de/wiki/pub/ABI/GenomicsLecture10Materials/structural-variation.pdf
http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1479-3
https://www.ncbi.nlm.nih.gov/dbvar/content/overview/
http://www.nature.com/subjects/structural-variation
https://eichlerlab.gs.washington.edu/news/NatMeth_Feb2012.pdf
https://www.ncbi.nlm.nih.gov/pubmed/19477992 ***
https://www.ncbi.nlm.nih.gov/pubmed/22452995
http://biorxiv.org/content/early/2016/09/06/073833
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479793/
http://www.nature.com/articles/srep18501
http://www.genetics.org/content/202/1/351
http://www.cs.cmu.edu/~sssykim/teaching/s13/slides/Lecture_SVI.pdf
http://schatzlab.cshl.edu/presentations/2016/2016.01.12.PAG.Structural%20Variations.pdf
Comments
I like this tool GRIDSS: the genomic rearrangement identification software suite. A high-speed next-gen sequencing structural variation caller. GRIDSS calls variants based on alignment-guided positional de Bruijn graph break-end assembly, split read, and read pair evidence.
https://github.com/PapenfussLab/gridss
This paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479793/ provide many others tools and their comparative importance.
Hydra-SV The pipeline described below is meant to convey a general approach that they have found to be effective for discovering SV breakpoints with Hydra. However, it should be noted that this approach seeks to discover breakpoints arising from both "unique" and duplicated (e.g., segmental duplications, recent retrotransposons).
https://code.google.com/archive/p/hydra-sv/wikis/TypicalWorkflow.wiki
GitHub https://github.com/arq5x/Hydra
PhD thesis and useful material on Structure Variation http://paduaresearch.cab.unipd.it/4930/1/tesi_dottorato_versione_finale.pdf
This research paper is also useful http://www.sciencedirect.com/science/article/pii/S1046202316300184
Structural variant detection and association testing https://github.com/zeeev/wham