BOL: Related items

G-NEST: The Gene NEighborhood Scoring Tool

Neel — Fri, 25 Sep 2020 20:09:18 -0500

The Gene NEighborhood Scoring Tool (G-NEST) combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all window sizes. Primary author of final code = William F. Martin. Example data files are in the separate repository.

Address of the bookmark: https://github.com/dglemay/G-NEST

Tools to access the quality of your assembled genome !

LEGE — Thu, 08 Aug 2024 23:31:18 -0500

FASTA VALIDATOR + SEQKIT RMDUP: FASTA validation
GENOMETOOLS GT GFF3VALIDATOR: GFF3 validation
ASSEMBLATHON STATS: Assembly statistics
GENOMETOOLS GT STAT: Annotation statistics
NCBI FCS ADAPTOR: Adaptor contamination pass/fail
NCBI FCS GX: Foreign organism contamination pass/fail
BUSCO: Gene-space completeness estimation
TIDK: Telomere repeat identification
LAI: Continuity of repetitive sequences
KRAKEN2: Taxonomy classification
HIC CONTACT MAP: Alignment and visualisation of HiC data
MUMMER → CIRCOS + DOTPLOT & MINIMAP2 → PLOTSR: Synteny analysis
MERQURY: K-mer completeness, consensus quality and phasing assessment

EXCAVATOR2tool

Bulbul — Wed, 30 Nov 2016 04:09:19 -0600

EXCAVATOR2 is a collection of bash, R and Fortran scripts and codes that analyses Whole Exome Sequencing (WES) data to identify CNVs. EXCAVATOR2 enhances the identification of all genomic CNVs, both overlapping and non-overlapping targeted exons by integrating the analysis of In-targets and Off- targets reads. Specifically, it improves the precision of calling CNVs overlapping targeted exons from WES data and enlarges the spectrum of detectable CNVs to off-target events.
EXCAVATOR2 can be effectively employed for the identification of CNVs in small as well as large-scale re-sequencing population and cancer studies. Lastly, it’s of particular interest that all WES experiments can be re-analysed using our method with the beneficial effect to identify novelCNVs in extra-exonic regions by having the full-genome CN profile.

Address of the bookmark: https://sourceforge.net/projects/excavator2tool/

Mulan: MUltiple sequence Local AligNment and conservation visualization tool

Rahul Nayak — Thu, 20 Jul 2017 08:02:32 -0500

Mulan performs multiple (2 or more) sequence alignments with an efficient and rapid "full local" alignment strategy that ensures a recapitulation of evolutionary sequence rearrangements (such as inversions and reshuffling) in any of the species. It combines refine and tba tools to align either "draft" or "finished" quality sequences. Mulan provides a dynamic graphical interface to align and visualize conservation profiles for evolutionarily distant and closely related species.

Input formats, automated data upload from the UCSC Genome Browser, gene annotation, annotation of repetitive elements, and progress report were previously described in the zPicture instructions and we refer the users to these materials for more details. This introduction is mainly focused on some novel features unique to the Mulan.

Address of the bookmark: https://mulan.dcode.org/mulanInstructions.php

Mugsy: multiple whole genome alignment tool

Jit — Fri, 08 Dec 2017 17:41:14 -0600

Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.

To cite Mugsy, use:

Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes.Bioinformatics 2011 27(3):334-4

Address of the bookmark: http://mugsy.sourceforge.net/

Multi-CAR: a tool of contig scaffolding using multiple references

Rahul Nayak — Tue, 06 Mar 2018 16:39:41 -0600

we design a simple heuristic method to further revise our single reference-based scaffolding tool CAR into a new one called Multi-CAR such that it can utilize multiple complete genomes of related organisms as references to more accurately order and orient the contigs of a draft genome. In practical usage, our Multi-CAR does not require prior knowledge concerning phylogenetic relationships among the draft and reference genomes and libraries of paired-end reads. To validate Multi-CAR, we have tested it on a real dataset composed of several prokaryotic genomes and also compared its accuracy performance with other multiple reference-based scaffolding tools Ragout and MeDuSa.

Address of the bookmark: http://genome.cs.nthu.edu.tw/Multi-CAR/

BFC: a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data

Jit — Thu, 31 May 2018 09:35:23 -0500

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes. The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC. https://github.com/lh3/bfc

Address of the bookmark: https://github.com/lh3/bfc

SALSA: A tool to scaffold long read assemblies with Hi-C

Jit — Fri, 15 Jun 2018 04:01:15 -0500

This code is used to scaffold your assemblies using Hi-C data. This version implements some improvements in the original SALSA algorithm. If you want to use the old version, it can be found in the old_salsa branch. To use the latest version, first run the following commands: cd SALSA make To run the code, you will need Python 2.7, BOOST libraries and Networkx(version lower than 1.2). If you consider using this tool, please cite our publication which describes the methods used for scaffolding. Ghurye, J., Pop, M., Koren, S., Bickhart, D., & Chin, C. S. (2017). Scaffolding of long read assemblies using long range contact information. BMC genomics, 18(1), 527. Link Ghurye, J., Rhie, A., Walenz, B.P., Schmitt, A., Selvaraj, S., Pop, M., Phillippy, A.M. and Koren, S., 2018. Integrating Hi-C links with assembly graphs for chromosome-scale assembly. bioRxiv, p.261149 Link For any queries, please either ask on github issue page or send an email to Jay Ghurye (jayg@cs.umd.edu).

Address of the bookmark: https://github.com/machinegun/SALSA

FMLRC: a long-read error correction tool using the multi-string Burrows Wheeler Transform

Neel — Fri, 10 Aug 2018 13:29:28 -0500

FMLRC, or FM-index Long Read Corrector, is a tool for performing hybrid correction of long read sequencing using the BWT and FM-index of short-read sequencing data. Given a BWT of the short-read sequencing data, FMLRC will build an FM-index and use that as an implicit de Bruijn graph. Each long read is then corrected independently by identifying low frequency k-mers in the long read and replacing them with the closest matching high frequency k-mers in the implicit de Bruijn graph. In contrast to other de Bruijn graph based implementations, FMLRC is not restricted to a particular k-mer size and instead uses a two pass method with both a short "k-mer" and a longer "K-mer". This allows FMLRC to correct through low complexity regions that are computational difficult for short k-mers.

Address of the bookmark: https://github.com/holtjma/fmlrc

SeqMonk:A tool to visualise and analyse high throughput mapped sequence data

Jit — Tue, 11 Sep 2018 04:39:38 -0500

SeqMonk is a program to enable the visualisation and analysis of mapped sequence data. It was written for use with mapped next generation sequence data but can in theory be used for any dataset which can be expressed as a series of genomic positions. It's main features are:

Import of mapped data from mapped data (BAM/SAM/bowtie etc)
Creation of data groups for visualisation and analysis
Visualisation of mapped regions against an annotated genome.
Flexible quantitation of the mapped data to allow comparisons between data sets
Statistical analysis of data to find regions of interest
Creation of reports containing data and genome annotation

Address of the bookmark: http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/