BOL: Related items

KAD: Assessing genome assemblies using K-mer copies in assemblies and K-mer abundance in Illumina reads

Jit — Fri, 19 Jun 2020 07:34:12 -0500

KAD is designed for evaluating the accuracy of nucleotide base quality of genome assemblies. Briefly, abundance of k-mers are quantified for both sequencing reads and assembly sequences. Comparison of the two values results in a single value per k-mer, K-mer Abundance Difference (KAD), which indicates how well the assembly matches read data for each k-mer.

where, c is the count of a k-mer from reads, m is the mode of counts of read k-mers, and n is the copy of the k-mer in the assembly.

Address of the bookmark: https://github.com/liu3zhenlab/KAD

JBrowse: Embeddable genome browser built completely with JavaScript and HTML5

Jit — Fri, 29 Jun 2018 09:19:56 -0500

JBrowse is a fast, embeddable genome browser built completely with JavaScript and HTML5, with optional run-once data formatting tools written in Perl. Headline Features: Fast, smooth scrolling and zooming. Explore your genome with unparalleled speed. Scales easily to multi-gigabase genomes and deep-coverage sequencing. Quickly open and view data files on your computer without uploading them to any server. Supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, VCF (with either .tbi or .idx index), REST, and more. BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files, no conversion needed. Includes an optional “faceted” track selector (see demo) suitable for large installations with thousands of tracks. Very light server resource requirements. In fact, JBrowse has no back-end server code, just tools for formatting data files to be read directly over HTTP. Serve huge datasets from a single low-cost cloud instance. Can run as a stand-alone app on OSX and Windows using the Electron platform Highly extensible plugin architecture, with a large plugin registry of existing examples here https://gmod.github.io/jbrowse-registry https://jbrowse.org/

Address of the bookmark: https://github.com/GMOD/jbrowse

Biological databases !

BioStar — Wed, 12 Feb 2020 01:16:29 -0600

Now a days there are a lots of genomics databases available around the world. This bookmark is created to provide all links in one place ...

ftp://ftp.ncbi.nih.gov/genomes/

https://hgdownload.soe.ucsc.edu/downloads.html

Address of the bookmark: ftp://ftp.ncbi.nih.gov/genomes/

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly

Jit — Wed, 06 Dec 2017 02:08:14 -0600

An efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k-mer frequencies. We evaluated our tool on 30× simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads.

Address of the bookmark: ftp://ftp.genomics.org.cn/pub/cope

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

Jit — Tue, 14 Nov 2017 10:26:16 -0600

We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.

Availability and Implementation

The software is implemented in Python 3 and runs in both Python 2.7 and 3.4– on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/ under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/.

Address of the bookmark: https://dfast.nig.ac.jp/

Circlator: automated circularization of genome assemblies using long sequencing reads

Poonam Mahapatra — Tue, 15 May 2018 09:42:32 -0500

A tool to circularize genome assemblies. The algorithm and benchmarks are described in the Genome Biology manuscript. Citation: "Circlator: automated circularization of genome assemblies using long sequencing reads", Hunt et al, Genome Biology 2015 Dec 29;16(1):294. doi: 10.1186/s13059-015-0849-0. PMID: 26714481.

Address of the bookmark: http://sanger-pathogens.github.io/circlator/

mScaffolder: A comparative genome scaffolding tool

Jit — Fri, 15 Jun 2018 04:48:01 -0500

A comparative genome scaffolding tool based on MUMmer

mScaffolder scaffolds a genome using an existing high quality genome as the reference. It aligns the two genomes using nucmer utility from MUMmer and then orders and orients the contigs of the candidate genome guided by their alignments to the reference genome. Please send your questions and comments to mchakrab@uci.edu.

Citation https://www.nature.com/articles/s41588-017-0010-y

Address of the bookmark: https://github.com/mahulchak/mscaffolder

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

Jit — Fri, 06 Jul 2018 03:36:45 -0500

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT:

hist: Create an histogram of k-mer occurrences from a sequence file. Adds metadata in output for easy plotting.
gcp: K-mer GC Processor. Creates a matrix of the number of K-mers found given a GC count and a K-mer count.
comp: K-mer comparison tool. Creates a matrix of shared K-mers between two (or three) sequence files or hashes.
sect: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file.
blob: Given, reads and an assembly, calculates both the read and assembly K-mer coverage along with GC% for each sequence in the assembly.SEquence Coverage estimator Tool.
filter: Filtering tools. Contains tools for filtering k-mer hashes and FastQ/A files:
- kmer: Produces a k-mer hash containing only k-mers within specified coverage and GC tolerances.
- seq: Filters a sequence file based on whether or not the sequences contain k-mers within a provided hash.
plot: Plotting tools. Contains several plotting tools to visualise K-mer and compare distributions. The following plot tools are available:
- density: Creates a density plot from a matrix created with the "comp" tool. Typically this is used to compare two K-mer hashes produced by different NGS reads.
- profile: Creates a K-mer coverage plot for a single sequence. Takes in fasta coverage output coverage from the "sect" tool
- spectra-cn: Creates a stacked histogram using a matrix created with the "comp" tool. Typically this is used to compare a jellyfish hash produced from a read set to a jellyfish hash produced from an assembly. The plot shows the amount of distinct K-mers absent, as well as the copy number variation present within the assembly.
- spectra-hist: Creates a K-mer spectra plot for a set of K-mer histograms produced either by jellyfish-histo or kat-histo.
- spectra-mx: Creates a K-mer spectra plot for a set of K-mer histograms that are derived from selected rows or columns in a matrix produced by the "comp".

In addition, KAT contains a python script for analysing the mathematical distributions present in the K-mer spectra in order to determine how much content is present in each peak.

This README only contains some brief details of how to install and use KAT. For more extensive documentation please visit: https://kat.readthedocs.org/en/latest/

https://academic.oup.com/bioinformatics/article/33/4/574/2664339

Address of the bookmark: https://github.com/TGAC/KAT

HASLR: a tool for rapid genome assembly of long sequencing reads

LEGE — Fri, 31 Jan 2020 05:50:15 -0600

HASLR is a tool for rapid genome assembly of long sequencing reads. HASLR is a hybrid tool which means it requires long reads generated by Third Generation Sequencing technologies (such as PacBio or Oxford Nanopore) together with Next Generation Sequencing reads (such as Illumina) from the same sample.

Address of the bookmark: https://github.com/vpc-ccg/haslr

PHYMMBL

Jit — Mon, 10 Oct 2016 08:56:34 -0500

Metagenomics sequencing projects collect samples of DNA from uncharacterized environments that may contain hundreds or even thousands of species. One of the main challenges in analyzing a metagenome is phylogenetic classification of raw sequence reads into groups representing the same or similar species. Such classification is a useful prerequisite for genome assembly and for analysis of the biological diversity present in a sample. The newest sequencing technologies have simultaneously made metagenomics easier, by making the sequencing process faster, and more difficult, by producing shorter read lengths than previous technologies. Methods for classifying sequences as short as 100 base pairs (bp) have until now been relatively inaccurate, requiring metagenomics projects to use older, long-read technologies. Phymm, a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, can accurately classify reads as short as 100 bp. Its accuracy for short reads represents a significant leap forward over previous composition-based classification methods. PhymmBL (rhymes with "thimble"), the hybrid classifier included in this distribution which combines analysis from both Phymm and BLAST, produces even higher accuracy.

Address of the bookmark: http://www.cbcb.umd.edu/software/phymm/