BOL: Related items

LSC: Improving PacBio Long Read Accuracy by Short Read Alignment

Abhimanyu Singh — Thu, 06 Sep 2018 16:27:35 -0500

Added Command line argument support.
Multi-stage execution modes.
Support for parallelization. Now execution proceeds in batches of long reads the size of which can be set by --long_read_batch_size N.
Better compressed intermediate files.
Added utilities folder.
Added support for multiple short read files.
Removed use of configuration file.

Address of the bookmark: https://www.healthcare.uiowa.edu/labs/au/LSC/

PhD program in Computer Science at University of Essex

Sat, 11 Feb 2017 13:11:36 -0600

As part of the PhD program in Computer Science at University of Essex, I am looking for a PhD student in computational and synthetic biology.
The ideal candidate is interested in designing new biological design automation methods for genome scale projects and/or network modelling of genomic, transcriptomic and proteomic data.
Candidates interested in developing optimization algorithms for biological problems are encouraged to apply as well.
A summary of the research work in the lab can be found on o this page.

Candidates interested in the position should contact me in advance by email to: g.stracquadanio@essex.ac.uk

The deadline for the application is 28/02/2017; info about the application can be found on the Essex CSEE website.

KAST

Neel — Wed, 23 Feb 2022 08:28:36 -0600

Perform Alignment-free k-tuple frequency comparisons from sequences. This can be in the form of two input files (e.g. a reference and a query) or a single file for pairwise comparisons to be made.

Address of the bookmark: https://github.com/martinjvickers/KAST

MafTools

Jit — Thu, 16 Feb 2017 11:16:01 -0600

maftools - An R package to summarize, analyze and visualize MAF files. Introduction.

With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widley accepted and used to store variants detected. The Cancer Genome Atlas Project has seqenced over 30 different cancers with sample size of each cancer type being over 200. The resulting data consisting of genetic variants is stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner either from TCGA sources or any in-house studies as long as the data is in MAF format. Maftools can also handle ICGC Simple Somatic Mutation format.

maftools is on bioRxiv

Please cite the below if you find this tool useful for you.

Mayakonda, A. and H.P. Koeffler, Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies. bioRxiv, 2016. doi: http://dx.doi.org/10.1101/052662

Address of the bookmark: https://github.com/PoisonAlien/maftools

BioDownloader

Surabhi Chaudhary — Sat, 25 Feb 2017 17:52:33 -0600

BioDownloader is a program for downloading and/or updating files from ftp/http servers. The program has unique features that are specifically designed to deal with bioinformatics data files and servers:

optimized to work with vast amount of data and very large file sets (~ 10,000 - 100,000).
allows the selective retrieval of only the required files (file masks, ls-lR parsing, recursive search, updates)
has a built-in repository containing the settings for the most common bioinformatics download needs
built-in wizard for batch post-processing of downloaded files (archive extraction, file conversion, etc.)
capable of performing multiple download or update tasks simultaneously

BioDownloader has a built-in repository containing the settings for common bioinformatics file-synchronization needs, including the Protein Data Bank (PDB) and National Center for Biotechnology Information (NCBI) databases. It can post-process downloaded files, including archive extraction and file conversions.

http://dunbrack.fccc.edu/BioDownloader/

Address of the bookmark: http://dunbrack.fccc.edu/BioDownloader/

Quota synteny alignment

Jit — Mon, 31 Jul 2017 04:11:57 -0500

Typically in comparative genomics, we can identify anchors, chain them into syntenic blocks and interpret these blocks as derived from a common descent. However, when comparing two genomes undergone ancient genome duplications (plant genomes in particular), we have large number of blocks that are not orthologous, but are paralogous. This has forced us sometimes to use ad-hoc rules to screen these blocks. So the question is: given the expected depth (quota) along both x- and y-axis, select a subset of the anchors with maximized total score.

Address of the bookmark: https://github.com/tanghaibao/quota-alignment

DIAL

Abhimanyu Singh — Wed, 01 Mar 2017 08:42:28 -0600

A computational pipeline for identifying single-base substitutions between two closely related genomes without the help of a reference genome. DIAL works even when the depth of coverage is insufficient for de novo assembly, and it can be extended to determine small insertions/deletions. Our main motivation is to use this tool to survey the genetic diversity of endangered species as the identified sequence differences can be used to design genotyping arrays to assist in the species' management.

http://www.bx.psu.edu/~ratan/

Address of the bookmark: http://www.bx.psu.edu/miller_lab/

Mugsy: multiple whole genome alignment tool

Jit — Fri, 08 Dec 2017 17:41:14 -0600

Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.

To cite Mugsy, use:

Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes.Bioinformatics 2011 27(3):334-4

Address of the bookmark: http://mugsy.sourceforge.net/

splitbam: splits a BAM by chromosomes

Jit — Tue, 28 Feb 2017 09:01:28 -0600

splitbam splits a BAM by chromosomes.

Using the reference sequence dictionary (*.dict), it also creates some empty BAM files if no sam record was found for a chromosome. A pair of 'mock' SAM-Records can also be added to those empty BAMs to avoid some tools (like samtools) to crash.

Usage

java -jar splitbam.jar -p OUT/__CHROM__/__CHROM__.bam -R ref.fasta (bam|sam|stdin)

Options

-h help; This screen.
-R (indexed reference file) REQUIRED.
-u (unmapped chromosome name): default:Unmapped
-e | --empty : generate EMPTY bams for chromosome having no read mapped
-m | --mock : if option '-e', add a mock pair of sam records to the empty bam
-p (output file/bam pattern) REQUIRED. MUST contain __CHROM__ and end with .bam
-s assume input is sorted.
-x | --index create index.
-t | --tmp (dir) tmp file directory
-G (file) chrom-group file (see below)

Address of the bookmark: https://code.google.com/archive/p/jvarkit/wikis/SplitBam.wiki

HISAT2: a fast and sensitive alignment program for mapping next-generation sequencing reads

Rahul Nayak — Tue, 08 May 2018 04:27:22 -0500

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome). Based on an extension of BWT for graphs [Sirén et al. 2014], we designed and implemented a graph FM index (GFM), an original approach and its first implementation to the best of our knowledge. In addition to using one global GFM index that represents a population of human genomes, HISAT2 uses a large set of small GFM indexes that collectively cover the whole genome (each index representing a genomic region of 56 Kbp, with 55,000 indexes needed to cover the human population). These small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. This new indexing scheme is called a Hierarchical Graph FM index (HGFM).

more at https://ccb.jhu.edu/software/hisat2/index.shtml

Address of the bookmark: https://github.com/infphilo/hisat2