BOL: Related items

Vicoso group

Wed, 02 Feb 2022 02:51:27 -0600

The Vicoso group investigates how sex chromosomes evolve over time, and what biological forces are driving their patterns of differentiation.

The Vicoso group is interested in understanding several aspects of the biology of sex chromosomes, and the evolutionary processes that shape their peculiar features. By combining the use of next-generation sequencing technologies with studies in several model and non-model organisms, they can address a variety of standing questions, such as: Why do some Y chromosomes degenerate while others remain homomorphic, and how does this relate to the extent of sexual dimorphism of the species? What forces drive some species to acquire global dosage compensation of the X, while others only compensate specific genes? What are the frequency and molecular dynamics of sex-chromosome turnover?

More at https://ist.ac.at/en/research/vicoso-group/
http://pub.ist.ac.at/~bvicoso/

InfoGenomeR: Integrative reconstruction of cancer genome karyotypes

Jit — Wed, 05 May 2021 01:02:18 -0500

InfoGenomeR is the Integrative Framework for Genome Reconstruction that uses a breakpoint graph to model the connectivity among genomic segments at the genome-wide scale. InfoGenomeR integrates cancer purity and ploidy, total CNAs, allele-specific CNAs, and haplotype information to identify the optimal breakpoint graph representing cancer genomes.

More at https://www.nature.com/articles/s41467-021-22671-6

Address of the bookmark: https://github.com/dmcblab/InfoGenomeR

GOLD:Genomes Online Database

Jit — Wed, 26 Jul 2017 07:49:29 -0500

GOLD:Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

https://gold.jgi.doe.gov/

Address of the bookmark: https://gold.jgi.doe.gov/

HIV genome database !

Rahul Nayak — Fri, 21 Jan 2022 05:40:15 -0600

HIV resources

https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

Address of the bookmark: https://www.hiv.lanl.gov/components/sequence/HIV/search/search.html

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication

Jit — Tue, 14 Nov 2017 10:26:16 -0600

We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7,000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 minutes, with rich information such as pseudogenes, translation exceptions, and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future.

Availability and Implementation

The software is implemented in Python 3 and runs in both Python 2.7 and 3.4– on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/ under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/.

Address of the bookmark: https://dfast.nig.ac.jp/

mScaffolder: A comparative genome scaffolding tool

Jit — Fri, 15 Jun 2018 04:48:01 -0500

A comparative genome scaffolding tool based on MUMmer

mScaffolder scaffolds a genome using an existing high quality genome as the reference. It aligns the two genomes using nucmer utility from MUMmer and then orders and orients the contigs of the candidate genome guided by their alignments to the reference genome. Please send your questions and comments to mchakrab@uci.edu.

Citation https://www.nature.com/articles/s41588-017-0010-y

Address of the bookmark: https://github.com/mahulchak/mscaffolder

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

Jit — Fri, 06 Jul 2018 03:36:45 -0500

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT:

hist: Create an histogram of k-mer occurrences from a sequence file. Adds metadata in output for easy plotting.
gcp: K-mer GC Processor. Creates a matrix of the number of K-mers found given a GC count and a K-mer count.
comp: K-mer comparison tool. Creates a matrix of shared K-mers between two (or three) sequence files or hashes.
sect: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file.
blob: Given, reads and an assembly, calculates both the read and assembly K-mer coverage along with GC% for each sequence in the assembly.SEquence Coverage estimator Tool.
filter: Filtering tools. Contains tools for filtering k-mer hashes and FastQ/A files:
- kmer: Produces a k-mer hash containing only k-mers within specified coverage and GC tolerances.
- seq: Filters a sequence file based on whether or not the sequences contain k-mers within a provided hash.
plot: Plotting tools. Contains several plotting tools to visualise K-mer and compare distributions. The following plot tools are available:
- density: Creates a density plot from a matrix created with the "comp" tool. Typically this is used to compare two K-mer hashes produced by different NGS reads.
- profile: Creates a K-mer coverage plot for a single sequence. Takes in fasta coverage output coverage from the "sect" tool
- spectra-cn: Creates a stacked histogram using a matrix created with the "comp" tool. Typically this is used to compare a jellyfish hash produced from a read set to a jellyfish hash produced from an assembly. The plot shows the amount of distinct K-mers absent, as well as the copy number variation present within the assembly.
- spectra-hist: Creates a K-mer spectra plot for a set of K-mer histograms produced either by jellyfish-histo or kat-histo.
- spectra-mx: Creates a K-mer spectra plot for a set of K-mer histograms that are derived from selected rows or columns in a matrix produced by the "comp".

In addition, KAT contains a python script for analysing the mathematical distributions present in the K-mer spectra in order to determine how much content is present in each peak.

This README only contains some brief details of how to install and use KAT. For more extensive documentation please visit: https://kat.readthedocs.org/en/latest/

https://academic.oup.com/bioinformatics/article/33/4/574/2664339

Address of the bookmark: https://github.com/TGAC/KAT

My commonly used commands in Bioinformatics

Rahul Nayak — Thu, 26 Jul 2018 04:58:45 -0500

FYI, I've found it useful to use MUMmer to extract the specific changes that Racon makes, so I can evaluate them individually:

minimap -t 24 assembly.fasta long_reads.fastq.gz | racon -t 24 long_reads.fastq.gz - assembly.fasta racon_assembly.fasta
nucmer -p nucmer assembly.fasta racon_assembly.fasta
show-snps -C -T -r nucmer.delta

This reports Racon's changes in a table. You can exclude indels with the -I option in show-snps.

This process (Racon -> MUMmer -> SNP table) solves the problem I originally raised in this issue. So as far as I'm concerned, you can close this issue (or keep it open if you still want to implement some kind of variant table).

Shasta long read assembler

Jit — Tue, 14 Jan 2020 06:47:07 -0600

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by Oxford Nanopore flow cells.

Computational methods used by the Shasta assembler include:

Using a run-length representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.
Using in some phases of the computation a representation of the read sequence based on markers, a fixed subset of short k-mers (k ≈ 10).

More at https://chanzuckerberg.github.io/shasta/index.html

Address of the bookmark: https://github.com/chanzuckerberg/shasta

odgi: optimized dynamic genome/graph implementation

Abhimanyu Singh — Tue, 01 Feb 2022 23:42:21 -0600

odgi provides an efficient and succinct dynamic DNA sequence graph model, as well as a host of algorithms that allow the use of such graphs in bioinformatic analyses.

Careful encoding of graph entities allows odgi to efficiently compute and transform pangenomes with minimal overheads. odgi implements a dynamic data structure that leveraged multi-core CPUs and can be updated on the fly.

The edges and path steps are recorded as deltas between the current node id and the target node id, where the node id corresponds to the rank in the global array of nodes. Graphs built from biological data sets tend to have local partial order and, when sorted, the deltas be small. This allows them to be compressed with a variable length integer representation, resulting in a small in-memory footprint at the cost of packing and unpacking.

The RAM and computational savings are substantial. In partially ordered regions of the graph, most deltas will require only a single byte.

Address of the bookmark: https://github.com/pangenome/odgi