BOL: Related items

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

Jit — Fri, 06 Jul 2018 03:36:45 -0500

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT:

hist: Create an histogram of k-mer occurrences from a sequence file. Adds metadata in output for easy plotting.
gcp: K-mer GC Processor. Creates a matrix of the number of K-mers found given a GC count and a K-mer count.
comp: K-mer comparison tool. Creates a matrix of shared K-mers between two (or three) sequence files or hashes.
sect: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file.
blob: Given, reads and an assembly, calculates both the read and assembly K-mer coverage along with GC% for each sequence in the assembly.SEquence Coverage estimator Tool.
filter: Filtering tools. Contains tools for filtering k-mer hashes and FastQ/A files:
- kmer: Produces a k-mer hash containing only k-mers within specified coverage and GC tolerances.
- seq: Filters a sequence file based on whether or not the sequences contain k-mers within a provided hash.
plot: Plotting tools. Contains several plotting tools to visualise K-mer and compare distributions. The following plot tools are available:
- density: Creates a density plot from a matrix created with the "comp" tool. Typically this is used to compare two K-mer hashes produced by different NGS reads.
- profile: Creates a K-mer coverage plot for a single sequence. Takes in fasta coverage output coverage from the "sect" tool
- spectra-cn: Creates a stacked histogram using a matrix created with the "comp" tool. Typically this is used to compare a jellyfish hash produced from a read set to a jellyfish hash produced from an assembly. The plot shows the amount of distinct K-mers absent, as well as the copy number variation present within the assembly.
- spectra-hist: Creates a K-mer spectra plot for a set of K-mer histograms produced either by jellyfish-histo or kat-histo.
- spectra-mx: Creates a K-mer spectra plot for a set of K-mer histograms that are derived from selected rows or columns in a matrix produced by the "comp".

In addition, KAT contains a python script for analysing the mathematical distributions present in the K-mer spectra in order to determine how much content is present in each peak.

This README only contains some brief details of how to install and use KAT. For more extensive documentation please visit: https://kat.readthedocs.org/en/latest/

https://academic.oup.com/bioinformatics/article/33/4/574/2664339

Address of the bookmark: https://github.com/TGAC/KAT

DeepVariant : an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

Jit — Sat, 25 Jan 2020 13:28:09 -0600

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework.

https://ai.googleblog.com/2017/12/deepvariant-highly-accurate-genomes.html

https://www.biorxiv.org/content/10.1101/092890v6

Address of the bookmark: https://github.com/google/deepvariant

Juicebox: Visualization and analysis software for Hi-C data

Jit — Fri, 21 Feb 2020 00:33:38 -0600

Juicebox is visualization software for Hi-C data. This distribution includes the source code for Juicebox, Juicer Tools, and Assembly Tools. Download Juicebox here, or use Juicebox on the web. Detailed documentation is available on the wiki. Instructions below pertain primarily to usage of command line tools and the Juicebox jar files.

Juicebox can now be used to visualize and interactively (re)assemble genomes. Check out the Juicebox Assembly Tools Module website https://aidenlab.org/assembly for more details on how to use Juicebox for assembly.

GUI at https://aidenlab.org/juicebox/

Address of the bookmark: https://github.com/aidenlab/Juicebox

wgd—simple command line tools for the analysis of ancient whole-genome duplications

LEGE — Thu, 23 Jul 2020 05:49:45 -0500

wgd is a easy to use command-line tool for K_S distribution construction named wgd. The wgd suite provides commonly used K_S and colinearity analysis workflows together with tools for modeling and visualization, rendering these analyses accessible to genomics researchers in a convenient manner.

https://academic.oup.com/bioinformatics/article/35/12/2153/5162749

Address of the bookmark: https://github.com/arzwa/wgd

Kmer: a suite of tools for DNA sequence analysis

BioStar — Wed, 18 Aug 2021 00:02:54 -0500

More at https://help.rc.ufl.edu/doc/Kmer

This also includes:

A2Amapper: ATAC, Assembly to Assembly Comparision tool:
- Comparative mapping between two genome assemblies (same species), or between two different genomes (cross species).

Sim4db:
- Spliced alignment of cDNA and genomic sequences, from the same (sim4) or related (sim4cc) species. Optimized for high-throughput batched alignment.

LEAFF:
- LEAFF (ahem, Let's Extract Anything From Fasta) is a utility program for working with multi-fasta files. In addition to providing random access to the base level, it includes several analysis functions.

Meryl:
- An out-of-core k-mer counter. The amount of sequence that can be processed for any size k depends only on the amount of free disk space.

Address of the bookmark: https://help.rc.ufl.edu/doc/Kmer

InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams

Jit — Wed, 29 Jun 2022 03:22:26 -0500

InteractiVenn, a more flexible tool for interacting with Venn diagrams including up to six sets. It offers a clean interface for Venn diagram construction and enables analysis of set unions while preserving the shape of the diagram. Set unions are useful to reveal differences and similarities among sets and may be guided in our tool by a tree or by a list of set unions. The tool also allows obtaining subsets’ elements, saving and loading sets for further analyses, and exporting the diagram in vector and image formats. InteractiVenn has been used to analyze two biological datasets, but it may serve set analysis in a broad range of domains.

More at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0611-3

Address of the bookmark: http://www.interactivenn.net/

pipesnake: bioinformatics best-practice analysis pipeline for phylogenomic reconstruction

LEGE — Wed, 21 Feb 2024 06:19:41 -0600

ausarg/pipesnake is a bioinformatics best-practice analysis pipeline for phylogenomic reconstruction starting from short-read 'second-generation' sequencing data.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Address of the bookmark: https://github.com/AusARG/pipesnake

Genome Annotation

Sun, 25 Aug 2013 10:53:01 -0500

Dr. Rob Edwards describes some of the problems, challenges, and approches in genome annotation, with a particular emphasis on how the Fellowship for the Interpretation of Genomes (FIG) developed subsystems using the SEED database available at http://www.theseed.org/

BUSCO

Jitendra Narayan — Sun, 07 Feb 2016 16:02:39 -0600

Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

More at http://busco.ezlab.org/

Address of the bookmark: http://busco.ezlab.org/

Centurion

Jit — Fri, 12 Feb 2016 04:45:41 -0600

Although centromeres are essential for life and are the subject of extensive research, centromere locations in yeast genomes are difficult to infer, and in most species they are still unknown. Recently, the chromatin conformation assay Hi-C has been re-purposed for diverse applications, including de novo genome assembly, deconvolution of metagenomic samples, and inference of centromere locations. We describe a method, Centurion, that jointly infers the locations of all centromeres in a single yeast genome by exploiting the centromeres’ tendency to cluster in 3D space. We first demonstrate the accuracy of Centurion in identifying known centromere locations from high coverage Hi-C data of budding yeast and a human malaria parasite. We then use two metagenomic samples with relatively low coverage Hi-C data to infer centromere locations for each chromosome in 14 different yeast species. For yeasts with large centromeres (e.g., S. pombe) Centurion predicts the exact centromere locations. For seven yeasts with point centromeres, Centurion predicts most of the centromeres at an average of 5~kb distance from their known locations. Finally, we predict centromere coordinates for six yeast species that currently lack centromere annotations. These results suggest that Centurion can be used for centromere identification for a large number of yeast species, even with a limited amount of Hi-C sequencing.

Paper:http://www.ncbi.nlm.nih.gov/pubmed/25940625

More at http://cbio.ensmp.fr/centurion/

Address of the bookmark: http://cbio.ensmp.fr/centurion/