BOL: Related items

The MARVEL assembler

Jit — Fri, 04 May 2018 19:18:41 -0500

MARVEL consists of a set of tools that facilitate the overlapping, patching, correction and assembly of noisy (not so noisy ones as well) long reads.

The assembly process can be summarized as follows:

overlap
patch reads
overlap (again)
scrubbing
assembly graph construction and touring
optional read correction
fasta file creation

Address of the bookmark: https://github.com/schloi/MARVEL

npScarf: real-time scaffolder using SPAdes contigs and Nanopore sequencing reads

Shruti Paniwala — Mon, 11 Jun 2018 05:14:57 -0500

npScarf (jsa.np.npscarf) is a program that connect contigs from a draft genomes to generate sequences that are closer to finish. These pipelines can run on a single laptop for microbial datasets. In real-time mode, it can be integrated with simple structural analyses such as gene ordering, plasmid forming.

Address of the bookmark: http://japsa.readthedocs.io/en/latest/tools/jsa.np.npscarf.html

LRCstats: Long Read Correction Statistics

Jit — Fri, 05 Jan 2018 04:04:20 -0600

LRCstats is an open-source pipeline for benchmarking DNA long read correction algorithms for long reads outputted by third generation sequencing technology such as machines produced by Pacific Biosciences. The reads produced by third generation sequencing technology, as the name suggests, are longer in length than reads produced by next generation sequencing technologies, such as those produced by Illumina. However, long reads are plagued by high error rates, which can cause issues in downstream analysis. Long read correction algorithms reduce the error rate of long reads either through self-correcting methods or using accurate, short reads outputted by next generation sequencing technologies to correct long reads.

Of course, some long read correction algorithms are better than others, and developers of long read correction algorithms will wish to compare their algorithm with others currently available. LRCstats benchmarks long read correction algorithms using long reads produced by simulators (such as SimLoRD or PBSim) where the two-way alignments between the uncorrected long reads (uLR) and the corresponding sequences in the reference genome (Ref) are given in some sort of alignment file and then aligning the corrected long reads (cLR) to the Ref-uLR two-way alignments to create three-way alignments using a dynamic programming algorithm. Statistics on these three-way alignments are then collected, such as the overall error rates of the corrected long reads.

https://www.healthcare.uiowa.edu/labs/au/LSC/

Address of the bookmark: https://github.com/cchauve/lrcstats

FMLRC: a long-read error correction tool using the multi-string Burrows Wheeler Transform

Neel — Fri, 10 Aug 2018 13:29:28 -0500

FMLRC, or FM-index Long Read Corrector, is a tool for performing hybrid correction of long read sequencing using the BWT and FM-index of short-read sequencing data. Given a BWT of the short-read sequencing data, FMLRC will build an FM-index and use that as an implicit de Bruijn graph. Each long read is then corrected independently by identifying low frequency k-mers in the long read and replacing them with the closest matching high frequency k-mers in the implicit de Bruijn graph. In contrast to other de Bruijn graph based implementations, FMLRC is not restricted to a particular k-mer size and instead uses a two pass method with both a short "k-mer" and a longer "K-mer". This allows FMLRC to correct through low complexity regions that are computational difficult for short k-mers.

Address of the bookmark: https://github.com/holtjma/fmlrc

Yau Group

Tue, 15 Oct 2013 13:05:15 -0500

Yau Group are a new research group based at the Wellcome Trust Centre for Human Genetics and the Department of Statistics at the University of Oxford.

Yau Group develops statistical and computational methods for the analysis of genomic datasets with a particular interest in cancer sequencing applications and the use of Bayesian Statistics.

Yau Group are currently have projects in somatic mutation analysis of heterogeneous cancers, data fusion or integration techniques and single cell genomics.

More @ http://www.well.ox.ac.uk/~cyau/index.html

orthodotter: Synteny plots (oxford grid)

Jit — Wed, 09 Aug 2017 07:16:16 -0500

orthodotter -h
--------------------------------------------------------------------------------
orthodotter - Plot orthologous genes on an oxford grid.
       -f      : input file, containing orthologous genes, default is stdin
                       species chr-name start end species chr-name start end
       -toPlot  : give the x and y sets and the color separated by double-dots,
                       for example set1:set2:red will plot set1 on x, set2 on y with
                       red points. Could give several -toPlot arguments.
                       To launch the clustering of dots, use extra-option 1=dist,min_nb_genes
                       where dist is the minimal distance (euclidian) between two points and min_nb_genes the minimal
                       number of genes in a cluster to be valid.
       -o      : output file, default is stdout
       -x       : resolution of x axis, default is 600
       -y       : resolution on y axis, default is 600
       -r       : radius of circle representing orthologous genes
       -format       : could be png, gif, jpg, pdf or ps. Default is png.
       -fg           : foreground color, default is black
       -bg           : background color, default is transparent
       -fSize   : fontSize, default is 1
       -filter       : check chromosome names
       -h            : help
--------------------------------------------------------------------------------
orthodotter -f Vigne_Banane.ortho -toPlot Vigne:Banane:black:1=10,5 -x 1200 -y 1200 -bg white -o Vigne_vs_Banane.png > Vigne_vs_Banane.clusters
--------------------------------------------------------------------------------

Address of the bookmark: https://github.com/institut-de-genomique/orthodotter

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

Rahul Nayak — Mon, 13 Nov 2017 05:10:23 -0600

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats

Address of the bookmark: https://github.com/OpenGene/AfterQC

FQC Dashboard: Integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool

Shruti Paniwala — Tue, 10 Nov 2020 01:30:22 -0600

FQC is software that facilitates quality control of FASTQ files by carrying out a QC protocol using FastQC, parsing results, and aggregating quality metrics into an interactive dashboard designed to richly summarize individual sequencing runs. The dashboard groups samples in dropdowns for navigation among the data sets, utilizes human-readable configuration files to manipulate the pages and tabs, and is extensible with CSV data.

Address of the bookmark: https://github.com/pnnl/fqc

Referee: Genome assembly quality scores

Jit — Sun, 04 Nov 2018 16:44:30 -0600

Modern genome sequencing technologies provide a succint measure of quality at each position in every read, however all of this information is lost in the assembly process. Referee summarizes the quality information from the reads that map to a site in an assembled genome to calculate a quality score for each position in the genome assembly.

We accomplish this by first calculating genotype likelihoods for every site. For a given site in a diploid genome, there are 10 possible genotypes (AA, AC, AG, AT, CC, CG, CT, GG, GT, TT). Referee takes as input the genotype likelihoods calculated for all 10 genotypes given the called reference base at each position.

Referee is a program to calculate a quality score for every position in a genome assembly. This allows for easy filtering of low quality sites for any downstream analysis.

https://github.com/gwct/referee

Address of the bookmark: https://gwct.github.io/referee/#

Ktrim: an extra-fast and accurate adapter- and quality-trimmer for sequencing data

Jit — Thu, 11 Feb 2021 21:39:05 -0600

Ktrim is written in C++ for GNU Linux/Unix platforms. After uncompressing the source package, you can find an executable file ktrim under bin/ directory compiled using g++ v4.8.5 and linked with libz v1.2.7 for Linux x86_64 system. If you could not run it (which is usually caused by low version of libc++ or libz library) or you want to build a version optimized for your system, you can re-compile the programs:

user@linux$ make clean && make

Address of the bookmark: https://github.com/hellosunking/Ktrim