BOL: Related items

karyoploteR: plot whole genomes with arbitrary data

Abhimanyu Singh — Fri, 02 Feb 2018 03:24:28 -0600

karyoploteR is an R package to create karyoplots, that is, representations of whole genomes with arbitrary data plotted on them. It is inspired by the R base graphics system and does not depend on other graphics packages. The aim of karyoploteR is to offer the user an easy way to plot data along the genome to get broad genome-wide view to facilitate the identification of genome wide relations and distributions.

Address of the bookmark: https://bernatgel.github.io/karyoploter_tutorial/

Blobology

Jit — Mon, 13 Jun 2016 10:18:33 -0500

Tools for making blobplots or Taxon-Annotated-GC-Coverage plots (TAGC plots) to visualise the contents of genome assembly data sets as a QC step

Blaxter Lab, Institute of Evolutionary Biology, University of Edinburgh

Goal: To create blobplots or Taxon-Annotated-GC-Coverage plots (TAGC plots) to visualise the contents of genome assembly data sets as a QC step.

This repository accompanies the paper:
Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Sujai Kumar, Martin Jones, Georgios Koutsovoulos, Michael Clarke, Mark Blaxter
(submitted 2013-10-01 to Frontiers in Bioinformatics and Computational Biology special issue : Quality assessment and control of high-throughput sequencing data).

It contains bash/perl/R scripts for running the analysis presented in the paper to create a preliminary assembly, and to create and collate GC content, read coverage and taxon annotation for the preliminary assembly, which can be visualised, such as Figure 2a from the paper showing TAGC plots/blobplots for Caenorhabditis sp. 5:

Address of the bookmark: https://github.com/blaxterlab/blobology

Linux command line exercises for NGS data processing

Jit — Wed, 22 Jun 2016 07:59:39 -0500

The purpose of this tutorial is to introduce students to the frequently used tools for NGS analysis as well as giving experience in writing one-liners. Copy the required files to your current directory, change directory (cd) to the linuxTutorial folder, and do all the processing inside:

[uzi@quince-srv2 ~/]$ cp -r /home/opt/MScBioinformatics/linuxTutorial .
[uzi@quince-srv2 ~/]$ cd linuxTutorial
[uzi@quince-srv2 ~/linuxTutorial]$

I have deliberately chosen Awk in the exercises as it is a language in itself and is used more often to manipulate NGS data as compared to the other command line tools such as grep, sed, perl etc. Furthermore, having a command on awk will make it easier to understand advanced tutorials such as Illumina Amplicons Processing Workflow.

In Linux, we use a shell that is a program that takes your commands from the keyboard and gives them to the operating system. Most Linux systems utilize Bourne Again SHell (bash), but there are several additional shell programs on a typical Linux system such as ksh, tcsh, and zsh. To see which shell you are using, type

[uzi@quince-srv2 ~/linuxTutorial]$ echo $SHELL

/bin/bash

Address of the bookmark: http://userweb.eng.gla.ac.uk/umer.ijaz/bioinformatics/linux.html

A5-miseq

Jit — Thu, 18 Aug 2016 04:05:23 -0500

_A5-miseq_ is a pipeline for assembling DNA sequence data generated on the Illumina sequencing platform. This README will take you through the steps necessary for running _A5-miseq_.

Point to note:

There are many situations where A5-miseq is not the right tool for the job. In order to produce accurate results, A5-miseq requires Illumina data with certain characteristics. A5-miseq will likely not work well with Illumina reads shorter than around 80nt, or reads where the base qualities are low in all or most reads before 60nt. A5-miseq assumes it is assembling homozygous haploid genomes. Use a different assembler for metagenomes and heterozygous diploid or polyploid organisms. Use a different assembler if a tool like FastQC reports your data quality is dubious. You have been warned! Datasets consisting solely of unpaired reads are not currently supported.

Address of the bookmark: https://sourceforge.net/projects/ngopt/

Kaiju

Jit — Mon, 27 Jun 2016 11:23:04 -0500

Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.

By default, Kaiju uses either the available complete genomes from NCBI RefSeq or the microbial subset of the non-redundant protein database nr used by NCBI BLAST, optionally also including fungi and microbial eukaryotes.

Kaiju translates reads into amino acid sequences, which are then searched in the database using a modified backward search on a memory-efficient implementation of the Burrows-Wheeler transform, which finds maximum exact matches (MEMs), optionally allowing mismatches in the protein alignment. The search can process up to millions of reads per minute using, for example, only 10 GB RAM with a protein database comprising 4821 microbial genomes. Kaiju can also be used for querying any other protein database without taxonomic classification, using either protein or nucleotide queries.

Kaiju is described in Menzel, P. et al. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257 (open access).

Address of the bookmark: http://kaiju.binf.ku.dk/

Scarpa

Poonam Mahapatra — Wed, 13 Jul 2016 07:59:25 -0500

Scarpa is a stand-alone scaffolding tool for NGS data. It can be used together with virtually any genome assembler and any NGS read mapper that supports SAM format. Other features include support for multiple libraries and an option to estimate insert size distributions from data. Scarpa is available free of charge for academic and commercial use under the GNU General Public License (GPL).

See the user manual or the paper for more information about Scarpa. Click here for the supplementary material.

Address of the bookmark: http://compbio.cs.toronto.edu/hapsembler/scarpa.html

CrossMap

Abhimanyu Singh — Mon, 05 Sep 2016 04:07:38 -0500

CrossMap is a program for convenient conversion of genome coordinates (or annotation files) between different assemblies (such as Human hg18 (NCBI36) <> hg19 (GRCh37), Mouse mm9 (MGSCv37) <> mm10 (GRCm38)).
It supports most commonly used file formats including SAM/BAM, Wiggle/BigWig, BED, GFF/GTF, VCF.
CrossMap is designed to liftover genome coordinates between assemblies. It’s not a program for aligning sequences to reference genome.
We do not recommend using CrossMap to convert genome coordinates between species.

Address of the bookmark: http://crossmap.sourceforge.net/

vcfR

Archana Malhotra — Fri, 19 Aug 2016 07:38:24 -0500

Most variant calling pipelines result in files containing large quantities of variant information. The variant call format (vcf) is an increasingly popular format for this data. The format of these files and their content is discussed in the vignette ‘vcf data.’ These files are typically intended to be post-processed (i.e., filtered) as an attempt to remove false positives or otherwise problematic sites. The R package vcfR provides tools to facilitate this filtering as well as to visualize the effects of choices made during this process.

Address of the bookmark: https://cran.r-project.org/web/packages/vcfR/vignettes/visualization_1.html

GeneValidator - Identify problems with predicted genes

Poonam Mahapatra — Fri, 26 Aug 2016 06:00:03 -0500

GeneValidator helps in identifing problems with gene predictions and provide useful information extracted from analysing orthologs in BLAST databases. The results produced can be used by biocurators and researchers who need accurate gene predictions.

If you would like to use GeneValidator on a few sequences, see our online GeneValidator Web App -http://genevalidator.sbcs.qmul.ac.uk.

If you use GeneValidator in your work, please cite us as follows:

Dragan M‡, Moghul MI‡, Priyam A, Bustos C & Wurm Y. 2016. GeneValidator: identify problems with protein-coding gene predictions. Bioinformatics, doi: 10.1093/bioinformatics/btw015.

Address of the bookmark: https://github.com/wurmlab/genevalidator

Sushi: An R/Bioconductor package for visualizing genomic data

Jit — Wed, 31 Aug 2016 08:29:12 -0500

Sushi: An R/Bioconductor package for visualizing genomic data

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/Sushi/inst/doc/Sushi.pdf