BOL: Related items

Beagle

Jit — Thu, 27 Oct 2016 11:19:00 -0500

Beagle is a software package that performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection.

Beagle version 4.1 has a more accurate genotype phasing algorithm and a very fast and accurate genotype imputation algorithm. Version 4.1 also has several changes to the command line arguments which are described in the release notes. The "ped" argument has no effect in version 4.1. If your data contains nuclear families and you want to model the parent-offspring relationships when phasing genotypes, please use version 4.0.

If you use Beagle 4.1 in a published analysis, please report the program version and cite the appropriate article.

The citation for Beagle's phasing algorithm is:

S R Browning and B L Browning (2007) Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084-1097.doi:10.1086/521987

The citation for Beagle's genotype imputation algorithm is:

B L Browning and S R Browning (2016). Genotype imputation with millions of reference samples. Am J Hum Genet 98:116-126.doi:10.1016/j.ajhg.2015.11.020

The citation for Beagle's IBD detection algorithm is:

B L Browning and S R Browning (2013). Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194(2):459-71.doi:10.1534/genetics.113.150029

Address of the bookmark: http://faculty.washington.edu/browning/beagle/beagle.html

R Graphs !!

Jit — Fri, 04 Nov 2016 10:48:00 -0500

The blog is a collection of script examples with example data and output plots. R produce excellent quality graphs for data analysis, science and business presentation, publications and other purposes. Self-help codes and examples are provided. Enjoy nice graphs !!

Address of the bookmark: http://rgraphgallery.blogspot.be/

Scripts

Jit — Wed, 30 Nov 2016 10:35:15 -0600

Useful script for NGS analysis.

Address of the bookmark: http://augustus.gobics.de/binaries/scripts/

YAHA

Jit — Fri, 20 Jan 2017 05:38:05 -0600

YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints.

Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA.

Contact:

http://genome.wustl.edu/people/groups/detail/hall-lab/

Address of the bookmark: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463118/

Fools guide

Poonam Mahapatra — Sun, 02 Apr 2017 14:31:18 -0500

This website and accompaning documents are intended as a tool to help researchers dealing with non-model organisms acquire and process transcriptomic high-throughput sequencing data without having to learn extensive bioinformatics skills. It covers all steps from tissue collection, sample preparation and computer setup, through addressing biological questions with gene expression and SNP data.

http://sfg.stanford.edu/denovo.html

http://sfg.stanford.edu/sequencing.html

http://sfg.stanford.edu/BLAST.html

http://sfg.stanford.edu/denovo.html

Address of the bookmark: http://sfg.stanford.edu/guide.html

Finishing !!

Jit — Sat, 20 May 2017 15:50:20 -0500

The process of finishing a genome and moving it from a draft stage (the result of sequencing and initial assembly) to a complete genome is typically a time and resource intensive task. The advent of new sequencing technologies has come with its own set of opportunities and pitfalls in the finishing process. While genomes can now be sequenced to high redundancy in a cost-effective manner, the process of assembling the genomes is more challenging and often draft genomes are fragmented into hundreds of contigs. Correspondingly, the task of producing the complete genome can involve months of lab work and thousands of finishing experiments and is usually done in large genome centers.

The work in our lab has focussed on computational approaches to speed-up the finishing process. Specifically, we have explored the use of optical mapping and mate-pair data to augment assemblies and direct finishing experiments. The tools developed in our lab have been used in several finishing projects, producing complete genomes (and near-complete ones) with surprisingly little computational and experimental effort (Nagarajan et al., in submission). The executables (as well as source code) for these tools are freely available here:

Scaffolding using Optical Restriction Mapping
Optical Maps are global, ordered maps of restriction site locations in a genome. This information can be quite useful in scaffolding contigs from a shotgun assembly to guide the finishing process. A set of programs to exploit optical maps for assembly can be found here: SOMA v2.0 (63 MB tar.gz file). This version of SOMA contains several improvements to programs in v1.0 as well as new scripts for working with multiple maps, contig graphs and scaffolds.
Augmenting assemblies with mate-pair data
Mate-pair information can be valuable in augmenting short-read assemblies and reconstructing the genome as larger scaffolds. AMOS-Hybrid is a pipeline written in the AMOS framework (open-source assembly tools) to merge arbitrary mated reads into an existing assembly and merge contigs and create scaffolds where possible. Source code and executables for AMOS-Hybrid are available here: AMOS-Hybrid v1.0 (142 MB tar.gz file).
Assembly and sequence-composition guided finishing
Contigs from a shotgun assembly are typically linked together in a graph structure that can serve to guide finishing and in some case close gaps in-silico. Also, in many cases, sequence composition of contigs can provide clues to fill gaps in scaffolds. A set of scripts to automate some of these tasks can be found here: Finishing Scripts v1.0 (63 MB tar.gz file).

http://www.cbcb.umd.edu/finishing/

Address of the bookmark: http://www.cbcb.umd.edu/finishing/

DIYA: a bacterial annotation pipeline for any genomics lab

Jit — Fri, 30 Jun 2017 08:48:26 -0500

DIY Genomics is an open source bioinformatics consortium intended to bring a collection of tools and libraries into the hands of small scale genomics labs for the process of sequence assembly and annotation. Projects include DIYA, MGAP, CRISPR, and DIYGV

http://gmod.org/wiki/Diya

Address of the bookmark: https://sourceforge.net/projects/diyg/

MIX: Combining multiple assemblies from NGS data

Rahul Nayak — Tue, 08 May 2018 04:58:05 -0500

Mix is a tool that combines two or more draft assemblies, without relying on a reference genome and has the goal to reduce contig fragmentation and thus speed-up genome finishing. The proposed algorithm builds an extension graph where vertices represent extremities of contigs and edges represent existing alignments between these extremities. These alignment edges are used for contig extension. The resulting output assembly corresponds to a path in the extension graph that maximizes the cumulative contig length.

The Mix algorithm, approach and results were published in BMC bioinformatics : http://www.biomedcentral.com/1471-2105/14/S15/S16.

Address of the bookmark: https://github.com/cbib/MIX

Convert EnsEMBL GTF to Annotation table (Geneid, GeneSymbol, GeneWiseChrLocation, GeneClass, Strand) Raw

EagleEye — Fri, 24 Jun 2016 18:08:49 -0500

Bash Script source:

https://gist.github.com/santhilalsubhash/367befcf5216be4b1fd9

Information:

This script converts EnsEMBL GTF (Ex: https://gist.githubusercontent.com/santhilalsubhash/1e7cca357e52a181dc25/raw/cfb803e07900a2baefbb6534f1299fd30cb57a29/sample.GTF) file to annotation table format. It generated two files
1) Transcript wise chromosome location with information about transcripts (Ex: https://gist.githubusercontent.com/santhilalsubhash/c7dec516e0338503a4b6/raw/de0af1a39f0005c4ce7321c5ae57fc8b4a14c7f4/sample.GTF_enst_annotation.txt)
2) Gene wise chromosome location with information about genes (Ex: https://gist.githubusercontent.com/santhilalsubhash/c92006c5080f0333bec2/raw/d16e0b2440d73b09b486d3c9751cdb248a73fa0b/sample.GTF_ensg_annotation.txt)

Note: You can download GTF files from http://www.ensembl.org/info/data/ftp/index.html

BASE: a practical de novo assembler for large genomes using long NGS reads

Rahul Nayak — Fri, 19 Oct 2018 07:25:21 -0500

new de novo assembler called BASE. It enhances the classic seed-extension approach by indexing the reads efficiently to generate adaptive seeds that have high probability to appear uniquely in the genome. Such seeds form the basis for BASE to build extension trees and then to use reverse validation to remove the branches based on read coverage and paired-end information, resulting in high-quality consensus sequences of reads sharing the seeds. Such consensus sequences are then extended to contigs.

Address of the bookmark: https://github.com/dhlbh/BASE