BOL: Related items

COSMIC

Jit — Sat, 01 Oct 2016 15:04:10 -0500

The accurate description and annotation of structural variants can be complex. This is due to the different resolution that variants are reported from traditional cytogenetic coordinates down to the actual base pair positions. Furthermore, multiple rearrangements in a single area of the genome can make cataloguing and interpreting their effects challenging.

The Rearrangement Overview page describes the one or more breakpoints which make up a structural variant. A breakpoint is defined as a region or point where the sample sequence has altered from the reference sequence. Minimum interpretation is made of this data. One variant event can consist of one or multiple breakpoints. The Syntax (shown above the table) gives a detailed description of the variant and its location (e.g. chr11:g.36585230_76606619del, a deletion of roughly 40Mb on chromosome 11). Syntax is based on HGVS mutation nomenclature recommendations [http://www.hgvs.org/rec.html].

http://cancer.sanger.ac.uk/cosmic/help/rearrangement/overview

Address of the bookmark: http://cancer.sanger.ac.uk/cosmic/help/rearrangement/overview

VirMet

Jit — Mon, 10 Oct 2016 08:27:19 -0500

Watch out: only a few files are counted in coverage statistics.

Full documentation on Read the Docs.

A set of tools for viral metagenomics.

virmet is called with a command subcommand syntax: virmet fetch --viral n, for example, downloads the bacterial database. Other available subcommands so far are

fetch download genomes
update update viral/bacterial database
index index genomes
wolfpack analyze a Miseq run
covplot plot coverage for a specific organism

A short help is obtained with virmet subcommand -h.

More at https://github.com/ozagordi/VirMet

Address of the bookmark: https://github.com/ozagordi/VirMet

Ribbon !!

Jit — Fri, 21 Oct 2016 04:54:30 -0500

Visualization has played an extremely important role in the current genomic revolution to inspect and understand variants, expression patterns, evolutionary changes, and a number of other relationships. However, most of the information in read-to-reference or genome-genome alignments is lost for structural variations in the one-dimensional views of most genome browsers showing only reference coordinates. Instead, structural variations captured by long reads or assembled contigs often need more context to understand, including alignments and other genomic information from multiple chromosomes. We have addressed this problem by creating Ribbon (genomeribbon.com) an interactive online visualization tool that displays alignments along both reference and query sequences, along with any associated variant calls in the sample. This way Ribbon shows patterns in alignments of many reads across multiple chromosomes, while allowing detailed inspection of individual reads (Supplementary Note 1). For example, here we show a gene fusion in the SK-BR-3 breast cancer cell line linking the genes CYTH1 and EIF3H. While it has been found in the transcriptome previously, genome sequencing did not identify a direct chromosomal fusion between these two genes. After SMRT sequencing, Ribbon shows that there are indeed long reads that span from one gene to the other, going through not one but two variants, for the first time showing the genomic link between these two genes (Figure 1a). More gene fusions of this cancer cell line are investigated in Supplementary Note 2. Figure 1b shows another complex event in this sample made simple in Ribbon: the translocation of a 4.4 kb sequence deleted from chr19 and inserted into chr16 (Figure 1b). Thus, Ribbon enables understanding of complex variants, and it may also help in the detection of sequencing and sample preparation issues, testing of aligners and variant-callers, and rapid curation of structural variant candidates (Supplementary Note 3). In addition to SAM and BAM files with long, short, or paired-end reads, Ribbon can also load coordinate files from whole genome aligners such as MUMmer. Therefore, Ribbon can be used to test assembly algorithms or inspect the similarity between species. Supplementary Note 4 shows a comparison of gorilla and human genomes using Ribbon, highlighting major structural differences. In conclusion, Ribbon is a powerful interactive web tool for viewing complex genomic alignments.

Script at https://github.com/MariaNattestad/ribbon

Address of the bookmark: http://genomeribbon.com/

HybPiper

Jit — Fri, 04 Nov 2016 05:02:10 -0500

HybPiper was designed for targeted sequence capture, in which DNA sequencing libraries are enriched for gene regions of interest, especially for phylogenetics. HybPiper is a suite of Python scripts that wrap and connect bioinformatics tools in order to extract target sequences from high-throughput DNA sequencing reads.

Targeted bait capture is a technique for sequencing many loci simultaneously based on bait sequences. HybPiper pipeline starts with high-throughput sequencing reads (for example from Illumina MiSeq), and assigns them to target genes using BLASTx or BWA. The reads are distributed to separate directories, where they are assembled separately using SPAdes. The main output is a FASTA file of the (in frame) CDS portion of the sample for each target region, and a separate file with the translated protein sequence.

HybPiper also includes post-processing scripts, run after the main pipeline, to also extract the intronic regions flanking each exon, investigate putative paralogs, and calculate sequencing depth. For more information, please see our wiki.

HybPiper is run separately for each sample (single or paired-end sequence reads). When HybPiper generates sequence files from the reads, it does so in a standardized directory hierarchy. Many of the post-processing scripts rely on this directory hierarchy, so do not modify it after running the initial pipeline. It is a good idea to run the pipeline for each sample from the same directory. You will end up with one directory per run of HybPiper, and some of the later scripts take advantage of this predictable directory structure.

Address of the bookmark: https://github.com/mossmatters/HybPiper

R Graphical Cookbook by Winston Chang

Abhimanyu Singh — Fri, 04 Nov 2016 12:50:30 -0500

R Graphical Cookbook by Winston Chang

A very nice book by Winston Chang for R ethusiast. The R code presented in these pages is the R code actually used to produce the Figures in the book. There will be differences compared to the code chunks shown in the text of the book, but in most cases the differences will be that these pages contain additional code to lay out multiple plots on a single "page".

The code presented for each figure is self-contained, i.e., all code required to produce the figure is included. This means that there is sometimes considerable overlap of code between several figures In some cases, it may be necessary to install an add-on package from CRAN to get the code to run.

More books at http://www.e-reading.club/bookreader.php/137370/C486x_APPb.pdf

Method in Comparative genomics !!

Jit — Wed, 09 Nov 2016 16:29:24 -0600

We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change.

We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on the genome-wide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs.

Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast, and will be invaluable in the study of complex genomes like that of human.

Address of the bookmark: http://web.mit.edu/manoli/www/publications/Kellis_JCB_04.pdf

EXCAVATOR2tool

Bulbul — Wed, 30 Nov 2016 04:09:19 -0600

EXCAVATOR2 is a collection of bash, R and Fortran scripts and codes that analyses Whole Exome Sequencing (WES) data to identify CNVs. EXCAVATOR2 enhances the identification of all genomic CNVs, both overlapping and non-overlapping targeted exons by integrating the analysis of In-targets and Off- targets reads. Specifically, it improves the precision of calling CNVs overlapping targeted exons from WES data and enlarges the spectrum of detectable CNVs to off-target events.
EXCAVATOR2 can be effectively employed for the identification of CNVs in small as well as large-scale re-sequencing population and cancer studies. Lastly, it’s of particular interest that all WES experiments can be re-analysed using our method with the beneficial effect to identify novelCNVs in extra-exonic regions by having the full-genome CN profile.

Address of the bookmark: https://sourceforge.net/projects/excavator2tool/

SGA: String Graph Assembler

Jit — Thu, 08 Dec 2016 05:08:59 -0600

SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.

More at

https://github.com/jts/sga

SGA dependencies:
-google sparse hash library (http://code.google.com/p/google-sparsehash/)
-the bamtools library (https://github.com/pezmaster31/bamtools)
-zlib (http://www.zlib.net/)
-(optional but suggested) the jemalloc memory allocator (http://www.canonware.com/jemalloc/download.html)

Address of the bookmark: https://github.com/jts/sga

Velvet tutorial

Poonam Mahapatra — Fri, 09 Dec 2016 04:19:07 -0600

The objective of this activity is to help you understand how to run Velvet in general, how to accurately estimate the insert size of a paired-end library through the use of Bowtie, the primary parameters of velvet, and the process involved in producing a de novo assembly from Illumina reads.

http://evomics.org/learning/assembly-and-alignment/velvet/

Address of the bookmark: http://evomics.org/learning/assembly-and-alignment/velvet/

Understanding Greedy Algorithms

Jit — Mon, 12 Dec 2016 04:37:40 -0600

Learning greedy algo for biologist.

https://www.topcoder.com/community/data-science/data-science-tutorials/greedy-is-good/

This webpage is also useful for the same:

http://learninglover.com/examples.php?id=59

http://www.cs.rpi.edu/~magdon/ps/conference/super_biokdd.pdf

https://ocw.mit.edu/courses/biology/7-91j-foundations-of-computational-and-systems-biology-spring-2014/lecture-slides/MIT7_91JS14_Lecture6.pdf

http://schatzlab.cshl.edu/teaching/AssemblyClass/01.%20Assembly%20Intro.pdf

http://lsl.sinica.edu.tw/Services/Class/files/20150612449.pdf

http://www.cs.jhu.edu/~langmea/resources/lecture_notes/assembly_scs.pdf

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-43.pdf

Address of the bookmark: https://www.topcoder.com/community/data-science/data-science-tutorials/greedy-is-good/