BOL: Related items

Structural variants PPT

Jit — Wed, 07 Sep 2016 03:16:09 -0500

1000 Genomes data tutorial at ASHG

Structural variants presentation by

Jan Korbel

European Molecular Biology Laboratory (EMBL) Heidelberg Genome Biology Research Unit

Reference:

https://www.genome.gov/pages/research/der/1000genomesprojecttutorials/structuralvariants-jankorbel.pdf

FERMI

Jit — Fri, 09 Sep 2016 05:37:13 -0500

Fermi is a de novo assembler with a particular focus on assembling Illumina short sequence reads from a mammal-sized genome. In addition to the role of a typical assembler, fermi also aims to preserve heterozygotes which are often collapsed by other assemblers. Its ultimate goal is to find a minimal set of
unitigs to represent all the information in raw reads.

Fermi follows the overlap-layout-consensus paradigm and uses the FM-DNA-index (FMD-index) as the key data structure. It is inspired by the string graph assembler (Simpson and Durbin, 2010 and 2012) and has a similar workflow.

As a typical de novo assembler, fermi tends to produce contigs with slightly longer N50. However, the major weakness of fermi is the high misassembly rate. Although fermi provides a tool to fix misassemblies by using paired-end reads to achieve an accuracy comparable to other assemblers, this is not a favorable solution.

Fermi is designed to be used on a multi-core Linux machine with large shared memory. The easiest way to run fermi is to use the run-fermi.pl script. It generates a Makefile. The actual assembly is done by invoking make. Premature assembly processes can be resumed. Here is an example:

run-fermi.pl -dAPe ./fermi -p NA12878 -t16 -f18 reads*.fq.gz > NA12878.mak
make -f NA12878.mak -j16

Address of the bookmark: https://github.com/lh3/fermi

GenomeScope: open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads

Jit — Fri, 21 Oct 2016 05:46:43 -0500

Summary: GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate, and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels, and error rates. Availability and Implementation: http://qb.cshl.edu/genomescope/, https://github.com/schatzlab/genomescope.git

Address of the bookmark: http://qb.cshl.edu/genomescope/

Statistics Using R with Biological Examples

Neel — Thu, 03 Nov 2016 04:55:41 -0500

This book is a manifestation of my desire to teach researchers in biology a bit more about statistics than an ordinary introductory course covers and to introduce the utilization of R as a tool for analyzing their data. My goal is to reach those with little or no training in higher level statistics so that they can do more of their own data analysis, communicate more with statisticians, and appreciate the great potential statistics has to offer as a tool to answer biological questions.

This is necessary in light of the increasing use of higher level statistics in biomedical research. I hope it accomplishes this mission and encourage its free distribution and use as a course text or supplement.

K Seefeld, May 2007

RECORD

Bulbul — Fri, 25 Nov 2016 08:23:36 -0600

Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. This is enough information to reconstruct at least some of the differences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly software to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modified reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modified reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our software publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly software.

More at https://sourceforge.net/projects/record-genome-assembler/files/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pubmed/26558255

HGA

Jit — Tue, 29 Nov 2016 07:25:53 -0600

HGA tool version 1.0 This tool helps to apply the Hierarchical Genome Assembly (HGA) method. The tool will apply: 1. Partitioning a given reads dataset into a given number of partitions. 2. Assembling each partitions using a pre-specified assembler (Velvet or SPAdes in this version) and using a given kmer size. 3. Merging all the assemblies of the partition. 4. Combining all the assemblies of the partition (using velvet with kmer value of 31). 5. Finaly, re-assembling the whole dataset with the merged contigs or the combined contigs, using a given kmer size.

https://github.com/aalokaily/Hierarchical-Genome-Assembly-HGA

Address of the bookmark: https://github.com/aalokaily/Hierarchical-Genome-Assembly-HGA

Scripts

Jit — Wed, 30 Nov 2016 10:35:15 -0600

Useful script for NGS analysis.

Address of the bookmark: http://augustus.gobics.de/binaries/scripts/

e-RGA: enhanced Reference Guided Assembly of Complex Genomes

Jit — Mon, 19 Dec 2016 05:56:14 -0600

Next Generation Sequencing has totally changed genomics: we are able to produce huge amounts of data at an incredibly low cost compared to Sanger sequencing. Despite this, some old problems have become even more difficult, de novo assembly being on top of this list. Despite efforts to design tools able to assemble, de novo, an organism sequenced with short reads, the results are still far from those achievable with long reads. In this paper, we propose a novel method that aims to improve de novo assembly in the presence of a closely related reference. The idea is to combine de novo and reference-guided assembly in order to obtain enhanced results.

Address of the bookmark: http://journal.embnet.org/index.php/embnetjournal/article/view/208

fqtools

Jit — Thu, 08 Dec 2016 09:31:12 -0600

fqtools is a software suite for fast processing of FASTQ files. Various file manipulations are supported. See below for a full list of the subcommands available and a brief description of their purpose. Most of the individual subcommands will take either a single file or a pair of files as input. If no input file is specified, fqtools will attempt to read data from stdin. In this case, it is advisabe to specify the format of the data provided. For subcommands that generate FASTQ data, either a single file or a pair of files will be generated. If no -o argument is provided, single files will be writted to stdout.

Address of the bookmark: https://github.com/alastair-droop/fqtools

Structural variation: the hidden genomic treasure

Jit — Sat, 10 Dec 2016 16:19:09 -0600

Genome re-sequencing projects have revealed substantial amounts of genetic variation between individuals extending beyond single nucleotide polymorphisms (SNPs) and short indels. Structural Variations (SVs) and Copy Number Variations (CNVs) are a major source of genomic variation. However, compared to SNPs, accurate detection, genotyping and understanding of CNVs is lagging behind due to much greater analytical challenges related to SV/CNV detection and analysis. In our lab we analyse SVs/CNVs using high-throughput sequencing and different analytical approaches. The most‐studied structural variants are copy number variations (CNVs) which can be generated by several different mechanisms including non‐allelic homologous recombination, non‐homologous end‐joining and deoxyribonucleic acid (DNA) replication‐related fork stalling and template switching. CNVs are closely related to segmental duplications (SDs): SDs can stimulate the formation of CNVs and themselves started out as CNVs, but became fixed in a species. Structural variation can be neutral but has also influenced our phenotypic evolution, for example our susceptibility to disease and our ability to digest certain types of food. Our understanding of the extent of structural variation is increasing rapidly, but it will be much more difficult to understand its phenotypic consequences.

Structural variants (SVs) such as deletions, insertions, duplications, inversions and translocations litter genomes and are often associated with gene expression changes and severe phenotypes (ie. genetic diseases in humans). Recent studies on the functional aspects of different types of SVs have unveiled several cases of adaptive evolution. For example, inversions have been associated with ecological adaptations and may facilitate speciation. Due to their prevalent nature, SVs arguably have a large impact on genome evolution and should not be neglected when studying the genetics of adaptation and speciation. SVs were classically defined as chromosomal rearrangements larger than 1kb, but due to a higher resolution of new detection methods, smaller variants (between 50 and 1000 base pairs) can now be accurately assessed. Besides various methods of detection in next generation sequencing data (paired end mapping, split reads, and depth of coverage), array-based approaches have proven to be particularly useful for detecting copy number variations (CNVs). These technologies have enabled researchers to catalog a wide spectrum of SVs in many organisms and infer the effects of selection shaping their evolutionary trajectories.

Structure variation sequencing signature (Source: NatRev Genetics)

Related tools, databases and publications are listed below. If you know any interesing papers, please let us know in comment section:

Key concepts

Structural variation includes balanced variants such as inversions and translocations, and unbalanced ones such as duplications and deletions (copy number variations or CNVs).

Structural variants can arise by several mechanisms, including nonallelic homologous recombination (NAHR), nonhomologous end‐joining (NHEJ) and DNA replication‐based fork stalling and template switching (FoSTeS).

CNV is closely linked to segmental duplication, but is not exactly the same. Segmental duplications can stimulate CNV formation by NAHR, and themselves arise from CNVs that have become fixed.

Segmental duplications did not appear uniformly during the evolution of the Great Ape species, but rather during a burst of activity around the time of the divergence of gorilla from the human/chimpanzee ancestor.

Duplicated genes play a critical role in the evolution of a genome as they act as ‘spare parts’ than can evolve to perform new or more specialized functions.

Effects of structural variation on gene expression can be identified but only a few examples of the consequences for species biology have been documented.

Tools

CNVnatora tool for CNV discovery and genotyping from depth of read mapping.2011a,2011b

AGEa tools that implements an algorithm for optimal alignment of sequences with SVs.2011

BreakSeqa pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.2010

PEMera computational and simulation framework for discovering SVs by paired-end read mapping.2009,2007

GASV https://code.google.com/archive/p/gasv/

PAIROSCOPE http://pairoscope.sourceforge.net/

SVDetect http://svdetect.sourceforge.net/Site/Home.html

BreakPtr, discovery of unbalanced structural variants (copy-number variants) with tiling microarrays Link

R Package https://www.bioconductor.org/help/course-materials/2010/EMBL2010/Practical-4-StructuralVariants.pdf

BreakSeq, structural variant genotyping using split reads Link

CopySeq, genotyping of unbalanced structural variants (copy-number variants) using read-depth Link

DELLY2, integrated structural variant discovery, genotyping and visualization in deep sequencing data Link

PEMer, structural variant discovery in 454 sequencing data by paired-end mapping Link

TIGER, transduction inference in germline genomes using short read data Link

MANTA https://github.com/Illumina/manta

SV-Bay https://github.com/InstitutCurie/SV-Bay

BreakDancer http://breakdancer.sourceforge.net/

Variation Hunter http://compbio.cs.sfu.ca/software-variation-hunter

Lumpy https://github.com/arq5x/lumpy-sv

ForestSV http://sebatlab.ucsd.edu/index.php/software-data

PBSuites for long reads https://sourceforge.net/projects/pb-jelly/

Visualization

The SV visualization tool: http://genomesavant.com/savant/

InGAP-SV (http://ingap.sourceforge.net/) that is nice tools for both detection and visualisation of severals kind of structural variations (Large insertions, translocation, deletion, inversions....)

Tools table: http://www.nature.com/nbt/journal/v29/n8/fig_tab/nbt.1904_T2.html

Variation Viewer https://www.ncbi.nlm.nih.gov/variation/view/

Papers

http://www.nature.com/nmeth/journal/v9/n2/full/nmeth.1858.html

http://journal.frontiersin.org/researchtopic/1412/structural-variations-in-genomes-ecological-and-evolutionary-implications

http://www.mi.fu-berlin.de/wiki/pub/ABI/GenomicsLecture10Materials/structural-variation.pdf

http://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1479-3

https://www.ncbi.nlm.nih.gov/dbvar/content/overview/

http://www.nature.com/subjects/structural-variation

https://eichlerlab.gs.washington.edu/news/NatMeth_Feb2012.pdf

https://www.ncbi.nlm.nih.gov/pubmed/19477992 ***

https://www.ncbi.nlm.nih.gov/pubmed/22452995

http://biorxiv.org/content/early/2016/09/06/073833

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4479793/

http://www.nature.com/articles/srep18501

http://www.genetics.org/content/202/1/351

http://www.cs.cmu.edu/~sssykim/teaching/s13/slides/Lecture_SVI.pdf

https://www.omicsonline.org/open-access/structural-variation-detection-from-next-generation-sequencing-2469-9853-S1-007.php?aid=69055

http://schatzlab.cshl.edu/presentations/2016/2016.01.12.PAG.Structural%20Variations.pdf