BOL: Related items

QuasR: Quantification and annotation of short reads in R

Neel — Fri, 13 Aug 2021 07:44:05 -0500

The QuasR package (short for Quantify and annotate short reads in R) integrates the functionality of several R packages (such as IRanges (Lawrence et al. 2013) and Rsamtools) and external software (e.g. bowtie, through the Rbowtie package, and HISAT2, through the Rhisat2 package). The package aims to cover the whole analysis workflow of typical high throughput sequencing experiments, starting from the raw sequence reads, over pre-processing and alignment, up to quantification. A single R script can contain all steps of a complete analysis, making it simple to document, reproduce or share the workflow containing all relevant details.

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/QuasR/inst/doc/QuasR.html

BFC: a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data

Jit — Thu, 31 May 2018 09:35:23 -0500

BFC is a standalone high-performance tool for correcting sequencing errors from Illumina sequencing data. It is specifically designed for high-coverage whole-genome human data, though also performs well for small genomes. The BFC algorithm is a variant of the classical spectrum alignment algorithm introduced by Pevzner et al (2001). It uses an exhaustive search to find a k-mer path through a read that minimizes a heuristic objective function jointly considering penalties on correction, quality and k-mer support. This algorithm was first implemented in my fermi assembler and then refined a few times in fermi, fermi2 and now in BFC. In the k-mer counting phase, BFC uses a blocked bloom filter to filter out most singleton k-mers and keeps the rest in a hash table (Melsted and Pritchard, 2011). The use of bloom filter is how BFC is named, though other correctors such as Lighter and Bless actually rely more on bloom filter than BFC. https://github.com/lh3/bfc

Address of the bookmark: https://github.com/lh3/bfc

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads

Jit — Wed, 07 Jun 2017 04:18:16 -0500

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html

Features

    Mapping position agnostic to alignment parameters.
    Consistently very high sensitivity and precision across different error profiles, rates and sequencing technologies even with default parameters.
    Circular genome handling to resolve coverage drops near ends of the genome.
    E-value.
    Meaningful mapping quality.
    Various alignment strategies (semiglobal bit-vector and Gotoh, anchored).
    Overlapping of reads for de novo assembly.
    Transcriptome mapping through internal construction of a transcriptome from a given genomic reference and a GTF file.
    ...and much more.

GraphMap is also used as an overlapper in a new de novo genome assembly project called Ra (https://github.com/mariokostelac/ra-integrate).
Ra attempts to create de novo assemblies from raw nanopore and PacBio reads without requiring error correction, for which a highly sensitive overlapper is required.

Currently, development of a new spliced-alignment mode for mapping RNA-seq reads is under way.
Description of the current effort as well as how to reach the experimental implementation can be found here: doc/rnaseq.md.

Address of the bookmark: https://github.com/isovic/graphmap

WhatsHap: fast and accurate read-based phasing

Jit — Mon, 28 May 2018 09:52:16 -0500

WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.

Features

Very accurate results (Martin et al., WhatsHap: fast and accurate read-based phasing)

Works well with Illumina, PacBio, Oxford Nanopore and other types of reads

It phases SNVs, indels and even “complex” variants (such as TCG → AGAA)

Pedigree phasing mode uses reads from related individuals (such as trios) to improve results and to reduce coverage requirements (Garg et al., Read-Based Phasing of Related Individuals).

WhatsHap is easy to install

It is easy to use: Pass in a VCF and one or more BAM files, get out a phased VCF. Supports multi-sample VCFs.

It produces standard-compliant VCF output by default

If desired, get output that is compatible with ReadBackedPhasing

Open Source (MIT license)

Address of the bookmark: https://whatshap.readthedocs.io/en/latest/

CLARK: Fast, accurate and versatile sequence classification system

Jit — Sat, 15 Feb 2020 01:49:01 -0600

CLARK, a method based on a supervised sequence classification using discriminative k-mers. Considering two distinct specific classification problems (see the article for details), namely (1) the taxonomic classification of metagenomic reads to known bacterial genomes, and (2) the assignment of BAC clones and transcript to chromosome arms/centromeres (in the absence of a finished assembly for the reference genome), CLARK outperforms in classification speed and precision the best state-of-the-art methods.

http://clark.cs.ucr.edu/Spaced/

Address of the bookmark: http://clark.cs.ucr.edu/Spaced/

HELIANO: A fast and accurate tool for detection of Helitron-like elements

LEGE — Tue, 13 Aug 2024 07:16:34 -0500

Helitron-like elements (HLE1 and HLE2) are DNA transposons. They have been found in diverse species and seem to play significant roles in the evolution of host genomes. Although known for over twenty years, Helitron sequences are still challenging to identify. Here, we propose HELIANO (Helitron-like elements annotator) as an efficient solution for detecting Helitron-like elements.

https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkae679/7730539?login=true

Address of the bookmark: https://github.com/Zhenlisme/heliano/

OPERA : Optimal Paired-End Read Assembler

Jit — Fri, 09 Sep 2016 05:28:58 -0500

OPERA (Optimal Paired-End Read Assembler) is a sequence assembly program (http://en.wikipedia.org/wiki/Sequence_assembly). It uses information from paired-end/mate-pair/long reads to order and orient the intermediate contigs/scaffolds assembled in a genome assembly project, in a process known as Scaffolding. OPERA is based on an exact algorithm that is guaranteed to minimize the discordance of scaffolds with the information provided by the paired-end/mate-pair/long reads (for further details see Gao et al, 2011).

Note that since the original publication, we have made significant changes to OPERA (v1.0 onwards) including refinements to its basic algorithm (to reduce local errors, improve efficiency etc.) and incorporated features that are important for scaffolding large genomes (multi-library support, better repeat-handling etc.), in addition to other scalability and usability improvements (bam and gzip support, smaller memory footprint). We therefore encourage you to download and use our latest version: OPERA-LG. In our benchmarks, it has significantly improved corrected N50 and reduced the number of scaffolding errors. Furthermore, our latest release contains the wrapper script OPERA-long-read that enables scaffolding with long-reads from third-generation sequencing technologies (PacBio or Oxford Nanopore). The manuscript describing the new features and algorithms is available at Genome Biology. We look forward to getting your feedback to improve it further.

Address of the bookmark: https://sourceforge.net/p/operasf/wiki/The%20OPERA%20wiki/

Run miniasm assembler on nanopore reads !

Jit — Mon, 18 Dec 2017 04:07:50 -0600

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Find the detail of the reads repeats:

fq2fa ONT_A.fastq ONT_A.fasta

minimap2 -xava-ont ONT_A.fasta ONT_A.fasta -t10 -X > AONT.paf

awk '{if($1==$6){print}}' AONT.paf > AONTself.paf

awk '$5=="-"' AONTself.paf | awk '{print $1}'| sort|uniq > invertedrepeat.list

Generated a few palindrome and repeats plots (highlighting only repeats largest than 10, 20 and 30 kb)

minidot -f 5 -m 30000 AONTself.paf > AONTself30000.eps
sed 's/_template_pass_FAH31515//' AONTself30000.eps > AONTself30000final.eps

minidot -f 5 -m 20000 AONTself.paf > AONTself20000.eps
sed 's/_template_pass_FAH31515//' AONTself20000.eps > AONTself20000final.eps

minidot -f 5 -m 10000 AONTself.paf > AONTself10000.eps
sed 's/_template_pass_FAH31515//' AONTself10000.eps > AONTself10000final.eps

Assemble with miniasm:

miniasm -f ONT_A.fasta AONT.paf > AONT.gfa
grep '^S' AONT.gfa |awk '{print ">"$2"\n"$3}' > AONT_miniasm.fasta

minimap2 -xasm10 AONT_miniasm.fasta AONT_miniasm.fasta -t1 -X > AONT_miniasm.paf

awk '{if($1==$6){print}}' AONT_miniasm.paf > AONT_miniasm_self.paf

minidot -f 5 -m 10000 AONT_miniasm_self.paf > AONT_miniasm_self10000.eps

Njoy the assembly !

Shasta long read assembler

Jit — Tue, 14 Jan 2020 06:47:07 -0600

The goal of the Shasta long read assembler is to rapidly produce accurate assembled sequence using as input DNA reads generated by Oxford Nanopore flow cells.

Computational methods used by the Shasta assembler include:

Using a run-length representation of the read sequence. This makes the assembly process more resilient to errors in homopolymer repeat counts, which are the most common type of errors in Oxford Nanopore reads.
Using in some phases of the computation a representation of the read sequence based on markers, a fixed subset of short k-mers (k ≈ 10).

More at https://chanzuckerberg.github.io/shasta/index.html

Address of the bookmark: https://github.com/chanzuckerberg/shasta

Comparison of Short Read De Novo Alignment Algorithms

Rahul Agarwal — Wed, 21 Aug 2013 07:56:01 -0500

Excellent article to introduce different sequencing methods along with tools for de novo assembly of sequencing reads and their relevant references.

Title: Comparison of Short Read De Novo Alignment Algorithms

Author: Nikhil Gopal

Address of the bookmark: http://biochem218.stanford.edu/Projects%202011/Gopal%202011.pdf