BOL: Related items

Mercator

Jit — Mon, 06 Feb 2017 04:20:36 -0600

Our basic strategy in building homology maps is to use exons that are orthologous in multiple genomes as map "anchors." Given K genomes, the steps in the map construction are as follows:

For each genome, obtain a set of exon annotations. These annotations can be a combination of both exon predictions (e.g. Genscan) and annotations that have been experimentally verified (e.g. RefSeq). Ideally, we would like to have these annotations be as sensitive as possible. Specificity is not a concern, as incorrect annotations are not likely not have significant alignments with other gene annotations.
Compare all exons against all exons in other genomes and record significant alignments between exons. Currently, we use BLAT to do this all-vs-all comparison with alignments being performed in protein space.
Construct a graph with each vertex corresponding to a exon and edges between vertices whose corresponding exons have significant alignments.
Identify cliques in this graph. These cliques are potential anchors to be used in the map.
Starting with the largest cliques (those that have exons in all or most of the genomes), join neighboring (adjacent in genomic coordinates, in each genome) cliques to form runs. Smaller cliques that are inconsistent with runs formed by larger cliques are filtered out. After the smallest cliques have been considered, cliques that are not part of a run are discarded.
The extents of each run in each genome are outputted as orthologous segments. The cliques from each run are used to output the exact genomic coordinates of anchors within each orthologous segment. These anchors can be used by genomic alignment programs (such as MAVID) to do a detailed alignment of each orthologous segment.

https://www.biostat.wisc.edu/~cdewey/mercator/

Address of the bookmark: https://www.biostat.wisc.edu/~cdewey/mercator/

MafTools

Jit — Thu, 16 Feb 2017 11:16:01 -0600

maftools - An R package to summarize, analyze and visualize MAF files. Introduction.

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widley accepted and used to store variants detected. The Cancer Genome Atlas Project has seqenced over 30 different cancers with sample size of each cancer type being over 200. The resulting data consisting of genetic variants is stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner either from TCGA sources or any in-house studies as long as the data is in MAF format. Maftools can also handle ICGC Simple Somatic Mutation format.

maftools is on bioRxiv

Please cite the below if you find this tool useful for you.

Mayakonda, A. and H.P. Koeffler, Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies. bioRxiv, 2016. doi: http://dx.doi.org/10.1101/052662

Address of the bookmark: https://github.com/PoisonAlien/maftools

DAGchainer: Computing Chains of Syntenic Genes in Complete Genomes

Abhimanyu Singh — Fri, 17 Feb 2017 16:13:35 -0600

The DAGchainer software computes chains of syntenic genes found within complete genome sequences. As input, DAGchainer accepts a list of gene pairs with sequence homology along with their genome coordinates. Using a scoring function which accounts for the distance between neighboring genes on each DNA molecule and the BLAST E-value score between homologs, maximally scoring chains of ordered gene pairs are computed and reported. This algorithm can be used to mine large evolutionary conserved regions of genomes between two organisms. Alternatively, by examining colinear sets of homologous genes found within a single genome, segmental genome duplications can be revealed.

This software distribution includes both the DAGchainer utility and a Java-based graphical interface that allows the inputs and outputs to be navigated and interrogated dynamically.

Address of the bookmark: http://dagchainer.sourceforge.net/

ConPADE: Genome Assembly Ploidy Estimation from Next-Generation Sequencing Data

Jit — Fri, 24 Feb 2017 04:55:41 -0600

ConPADE (Contig Ploidy and Allele Dosage Estimation), a probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. In the process, they report findings regarding errors in sequencing. The method can be used for whole genome shotgun (WGS) sequencing data. They also show applicability of the method for variant calling and allele dosage estimation. Results for simulated and real datasets are discussed and provide evidence that ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low.

https://github.com/microsoftgenomics

Address of the bookmark: https://github.com/microsoftgenomics

PBSuite: Software for Long-Read Sequencing Data from PacBio

Jit — Mon, 27 Feb 2017 09:54:47 -0600

PBJelly - the genome upgrading tool.
PBHoney - the structural variation discovery tool

Both are contained within the PBSuite code found in downloads.

----- PBJelly -----
Read The Paper
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0047768

PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

----- PBHoney -----
Read The Paper
http://www.biomedcentral.com/1471-2105/15/180/abstract

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.

Address of the bookmark: https://sourceforge.net/projects/pb-jelly/

MyCC: Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

Jit — Fri, 03 Mar 2017 08:34:23 -0600

MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments.

More at http://www.nature.com/articles/srep24175

Address of the bookmark: https://sourceforge.net/projects/sb2nhri/files/MyCC/

Prokka: tool for the rapid annotation of prokaryotic genomes

Jit — Mon, 06 Mar 2017 03:49:57 -0600

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Address of the bookmark: http://www.vicbioinformatics.com/software.prokka.shtml

COCACOLA (binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment, and paired-end read LinkAge)

Jit — Tue, 07 Mar 2017 08:50:57 -0600

COCACOLA is a general framework that combines different types of information: sequence COmposition, CoverAge across multiple samples, CO-alignment to reference genomes and paired-end reads LinkAge to automatically bin contigs into OTUs. Furthermore, COCACOLA seamlessly embraces customized prior knowledge to facilitate binning accuracy.

News: Python version of COCACOLA is available now!

Address of the bookmark: https://github.com/younglululu/COCACOLA

Structural polymorphism analysis from NGS data

Sat, 13 Jul 2013 17:12:47 -0500

The LabEx BASC (Biodiversity, Agroecosystems, Society, Climate), a network of 13 laboratories of the Paris-Saclay Scientific Cluster, is seeking a bioinformatician to analyze Next Generation Sequencing (NGS) data analysis. In the context of a flagship project aiming at understanding and improving the adaptive capacity of agroecosystems it will be critical to establish a link between sequence variation, functional variation, gene/protein expression and phenotypic adaptation.

The successful candidate will be in charge of the detection of polymorphisms including structural variants, of the comparison of multiple and diverse genomes of a same species and of the construction of pan- and core-genomes. These challenging tasks will require bioinformatics developments and implementation of methods for accommodating the high level of repetitiveness of complex genomes. The tools will be integrated into pipelines and made available to end-users through the Galaxy platform. The bioinformatician will therefore also have to provide researchers with advices on their experimental designs in order to ensure compliance of produced datasets with pipelines requirements. He/she will be hosted by a bioinformatics/informatics team (7 people) (http://moulon.inra.fr/index.php/fr/equipestransversales/atelier-de-bioinformatique) which has computational facilities and expertise in NGS data analysis, and will benefit as well from national and international collaborative networks (Aplibio http://www.renabi.fr/platforms/aplibio/, Transplant http://transplantdb.eu, AMAIZING http://www.amaizing.fr/).

The position requires a doctoral degree (PhD) in bioinformatics with strong expertise in script writing (Python/Perl) and pipeline development.

Applicants should send a CV and the names of 2 referees willing to provide a letter of recommendation to joets@moulon.inra.fr.

Fools guide

Poonam Mahapatra — Sun, 02 Apr 2017 14:31:18 -0500

This website and accompaning documents are intended as a tool to help researchers dealing with non-model organisms acquire and process transcriptomic high-throughput sequencing data without having to learn extensive bioinformatics skills. It covers all steps from tissue collection, sample preparation and computer setup, through addressing biological questions with gene expression and SNP data.

http://sfg.stanford.edu/denovo.html

http://sfg.stanford.edu/sequencing.html

http://sfg.stanford.edu/BLAST.html

http://sfg.stanford.edu/denovo.html

Address of the bookmark: http://sfg.stanford.edu/guide.html