BOL: Related items

CoLoRMap: Correcting Long Reads by Mapping short reads

Jit — Mon, 20 Aug 2018 14:17:05 -0500

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormap

ehaghshe@sfu.ca or cedric.chauve@sfu.ca

Address of the bookmark: https://github.com/sfu-compbio/colormap

MashMap: a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s)

Jit — Tue, 12 Dec 2017 17:23:31 -0600

MashMap is a fast and approximate software for mapping long reads (PacBio/ONT) or assembly to reference genome(s). It maps a query sequence against a reference region if and only if its estimated alignment identity is above a specified threshold. It does not compute the alignments explicitly, but rather estimates a k-mer based Jaccard similarity using a combination of Winnowing and MinHash. This is then converted to an estimate of sequence identity using the Mash distance. An appropriate k-mer sampling rate is automatically determined given minimum local alignment length and identity thresholds. The efficiency of the algorithm improves as both of these thresholds are increased.

Address of the bookmark: https://github.com/marbl/MashMap

Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads

Jit — Tue, 12 Jun 2018 12:41:10 -0500

Breakpointer is a fast tool for locating sequence breakpoints from the alignment of single end reads (SE) produced by next generation sequencing (NGS). It adopts a heuristic method in searching for local mapping signatures created by insertion/deletions (indels) or more complex structural variants(SVs).

Address of the bookmark: https://github.com/ruping/Breakpointer

Stampy

Abhi — Fri, 20 May 2016 19:13:32 -0500

Stampy is a package for the mapping of short reads from illumina sequencing machines onto a reference genome. It's recommended for most workflows, including those for genomic resequencing, RNA-Seq and Chip-seq. Stampy excels in the mapping of reads containing that contain sequence variation relative to the reference, in particular for those containing insertions or deletions.

Address of the bookmark: http://www.well.ox.ac.uk/project-stampy

shovill: Assemble bacterial isolate genomes from Illumina paired-end reads

BioStar — Sat, 02 Jan 2021 07:05:36 -0600

Shovill is a pipeline which uses SPAdes at its core, but alters the steps before and after the primary assembly step to get similar results in less time. Shovill also supports other assemblers like SKESA, Velvet and Megahit, so you can take advantage of the pre- and post-processing the Shovill provides with those too.

Address of the bookmark: https://github.com/tseemann/shovill

mrFAST: Micro Read Fast Alignment Search Tool

Neel — Tue, 26 Apr 2016 03:50:06 -0500

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked.

More at http://mrfast.sourceforge.net/manual.html

Address of the bookmark: http://mrfast.sourceforge.net/manual.html

Maq: Mapping and Assembly with Quality

Jit — Tue, 22 Nov 2016 04:51:39 -0600

Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. Maq is a project hosted by SourceForge.net. The project page is available athttp://sourceforge.net/projects/maq/. Maq is previously known as mapass2.

Run Maq Now

Follow these steps to try Maq. All you need is a reference sequence file in the FASTA format.

Prepare a reference sequence (ref.fasta). Better a bacterial genome.
Download maq, maq-data and maqview at the download page.
Copy maq, maq.pl and maq_eval.pl to the $PATH or to the same directory.
Simulate diploid reference and read sequences, map reads, call variants and evaluate the results in one go:
```
maq.pl demo ref.fasta calib-30.dat
```
where calib-30.dat is contained in maq-data.

View the alignment:

cd maqdemo/easyrun;
maqindex -i -c consensus.cns all.map;
maqview -c consensus.cns all.map

Even for advanced maq users, running `maq.pl demo' is recommended. You may find something helpful.

Address of the bookmark: http://maq.sourceforge.net

BlasR Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application,

Jit — Wed, 23 May 2018 06:54:32 -0500

BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands to tens of thousands of bases long with divergence between the read and genome dominated by insertion and deletion error.

Here is how I use the blasr to align PacBio reads to the contigs (target.fasta). The “target.fasta.sa” is the suffix array from “target.fasta” generated by sawriter.

blasr query.fa ./target.fasta -sa ./target.fasta.sa -bestn 40 -maxScore -500 -m 4 -nproc 24 -out target.m4 -maxLCPLength 15

the output format option “-m 4″ generate the alignment coordinate. Not fully documented, but I can explain that to you.

I use a 24 cores / 48G ram server for the alignment. It took about 2 to 3 hours aligning 3G PacBio Reads to 10^6 sequences of short read contigs with a mean 3.5kbp length.

Address of the bookmark: http://bix.ucsd.edu/projects/blasr/

P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads

Jit — Tue, 12 Jun 2018 08:14:41 -0500

P_RNA_scaffolder, a fast and accurate tool using paired-end RNA-sequencing reads to scaffold genomes. This tool aims to improve the completeness of both protein-coding and non-coding genes. After this tool was applied to scaffolding human contigs, the structures of both protein-coding genes and circular RNAs were almost completely recovered and equivalent to those in a complete genome, especially for long proteins and long circular RNAs.

Address of the bookmark: http://www.fishbrowser.org/software/P_RNA_scaffolder/

QuasR: Quantification and annotation of short reads in R

Neel — Fri, 13 Aug 2021 07:44:05 -0500

The QuasR package (short for Quantify and annotate short reads in R) integrates the functionality of several R packages (such as IRanges (Lawrence et al. 2013) and Rsamtools) and external software (e.g. bowtie, through the Rbowtie package, and HISAT2, through the Rhisat2 package). The package aims to cover the whole analysis workflow of typical high throughput sequencing experiments, starting from the raw sequence reads, over pre-processing and alignment, up to quantification. A single R script can contain all steps of a complete analysis, making it simple to document, reproduce or share the workflow containing all relevant details.

Address of the bookmark: https://www.bioconductor.org/packages/devel/bioc/vignettes/QuasR/inst/doc/QuasR.html