BOL: Related items

Progressive Cactus

Jit — Tue, 17 Jan 2017 03:40:06 -0600

v0.0 by Glenn Hickey (hickey@soe.ucsc.edu)

Progressive Cactus is a whole-genome alignment package.

Requirements

git
gcc 4.2 or newer
python 2.7
wget
64bit processor and build environment
150GB+ of memory on at least one machine when aligning mammal-sized genomes; less memory is needed for smaller genomes.
Parasol or SGE for cluster support.
750M disk space

Instructions

IMPORTANT NOTE: Progressive Cactus does not presently support installation into paths that contain spaces. Until this is resolved, you can use a softlink as a workaround: ln -s "path with spaces" "installation path without spaces"

In the parent directory of where you want Progressive Cactus installed:

git clone git://github.com/glennhickey/progressiveCactus.git
cd progressiveCactus
git pull
git submodule update --init
make

It is also convenient to add the location of progressiveCactus/bin to your PATH environment variable. In order to run the included tools (ex hal2maf) in the submodules/ directory structure, first source progressiveCactus/environment to load the installed environment.

If any errors occur during the build process, you are unlikely to be able to use the tool. Please submit a GitHub issue so we can help out: not only will you help yourself, but others who wish to use the tool as well.

Note that all dependencies are also built and included in the submodules/ directory. This increases the size and build time but greatly simplifies installation and version management. The installation does not create or modify any files outside the progressiveCactus/ directory.

Address of the bookmark: https://github.com/glennhickey/progressiveCactus

Cgaln

Jit — Wed, 22 Feb 2017 05:14:15 -0600

Cgaln (Coarse grained alignment) is a program designed to align a pair of whole genomic sequences of not only bacteria but also entire chromosomes of vertebrates on a nominal desktop computer. Cgaln performs an alignment job in two steps, at the block level and then at the nucleotide level. The former "coarse-grained" alignment can explore genomic rearrangements and reduce the regions to be analyzed in the next step. The latter is devoted to detailed alignment within the limited regions found in the first stage. The output of Cgaln is 'glocal' in the sense that rearrangements are taken into consideration while each alignable region is extended as long as possible. Thus, Cgaln is not only fast and memory-efficient, but also can filter noisy outputs without missing the most important homologous segment pairs.

http://www.iam.u-tokyo.ac.jp/chromosomeinformatics/rnakato/cgaln/

Address of the bookmark: http://www.iam.u-tokyo.ac.jp/chromosomeinformatics/rnakato/cgaln/

Mercator

Jit — Mon, 06 Feb 2017 04:20:36 -0600

Our basic strategy in building homology maps is to use exons that are orthologous in multiple genomes as map "anchors." Given K genomes, the steps in the map construction are as follows:

For each genome, obtain a set of exon annotations. These annotations can be a combination of both exon predictions (e.g. Genscan) and annotations that have been experimentally verified (e.g. RefSeq). Ideally, we would like to have these annotations be as sensitive as possible. Specificity is not a concern, as incorrect annotations are not likely not have significant alignments with other gene annotations.
Compare all exons against all exons in other genomes and record significant alignments between exons. Currently, we use BLAT to do this all-vs-all comparison with alignments being performed in protein space.
Construct a graph with each vertex corresponding to a exon and edges between vertices whose corresponding exons have significant alignments.
Identify cliques in this graph. These cliques are potential anchors to be used in the map.
Starting with the largest cliques (those that have exons in all or most of the genomes), join neighboring (adjacent in genomic coordinates, in each genome) cliques to form runs. Smaller cliques that are inconsistent with runs formed by larger cliques are filtered out. After the smallest cliques have been considered, cliques that are not part of a run are discarded.
The extents of each run in each genome are outputted as orthologous segments. The cliques from each run are used to output the exact genomic coordinates of anchors within each orthologous segment. These anchors can be used by genomic alignment programs (such as MAVID) to do a detailed alignment of each orthologous segment.

https://www.biostat.wisc.edu/~cdewey/mercator/

Address of the bookmark: https://www.biostat.wisc.edu/~cdewey/mercator/

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

Jit — Fri, 24 Feb 2017 04:55:41 -0600

ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, they report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. They also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low.

https://github.com/microsoftgenomics

Address of the bookmark: https://github.com/microsoftgenomics

PBSuite: Software for Long-Read Sequencing Data from PacBio

Jit — Mon, 27 Feb 2017 09:54:47 -0600

PBJelly - the genome upgrading tool.
PBHoney - the structural variation discovery tool

Both are contained within the PBSuite code found in downloads.

----- PBJelly -----
Read The Paper
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0047768

PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

----- PBHoney -----
Read The Paper
http://www.biomedcentral.com/1471-2105/15/180/abstract

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

Address of the bookmark: https://sourceforge.net/projects/pb-jelly/

MyCC: Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

Jit — Fri, 03 Mar 2017 08:34:23 -0600

MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments.

More at http://www.nature.com/articles/srep24175

Address of the bookmark: https://sourceforge.net/projects/sb2nhri/files/MyCC/

Prokka: tool for the rapid annotation of prokaryotic genomes

Jit — Mon, 06 Mar 2017 03:49:57 -0600

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Address of the bookmark: http://www.vicbioinformatics.com/software.prokka.shtml

COCACOLA (binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge)

Jit — Tue, 07 Mar 2017 08:50:57 -0600

COCACOLA is a general framework that combines different types of information: sequence COmposition, CoverAge across multiple samples, CO-alignment to reference genomes and paired-end reads LinkAge to automatically bin contigs into OTUs. Furthermore, COCACOLA seamlessly embraces customized prior knowledge to facilitate binning accuracy.

News: Python version of COCACOLA is available now!

Address of the bookmark: https://github.com/younglululu/COCACOLA

Hoffman Lab

Tue, 12 Jan 2016 02:47:41 -0600

They develop machine learning techniques to better understand chromatin biology. These models and algorithms transform high-dimensional functional genomics data into interpretable patterns and lead to new biological insight.

https://www.pmgenomics.ca/hoffmanlab/

Fools guide

Poonam Mahapatra — Sun, 02 Apr 2017 14:31:18 -0500

This website and accompaning documents are intended as a tool to help researchers dealing with non-model organisms acquire and process transcriptomic high-throughput sequencing data without having to learn extensive bioinformatics skills. It covers all steps from tissue collection, sample preparation and computer setup, through addressing biological questions with gene expression and SNP data.

http://sfg.stanford.edu/denovo.html

http://sfg.stanford.edu/sequencing.html

http://sfg.stanford.edu/BLAST.html

http://sfg.stanford.edu/denovo.html

Address of the bookmark: http://sfg.stanford.edu/guide.html