AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

Jit — Fri, 29 Jun 2018 03:26:03 -0500

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair. Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats The author has reimplemented this tool in C++ with multithreading support to make it much faster. The new tool is called fastp and can be found at: https://github.com/OpenGene/fastp . If you prefer a C++ based tool, please use fastp instead. https://github.com/OpenGene/AfterQC

Address of the bookmark: https://github.com/OpenGene/AfterQC

Comment by Jit

Jit — Fri, 29 Jun 2018 03:30:04 -0500

AfterQC AfterQC - Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
dupRadar dupRadar. An R package which provides functions for plotting and analyzing the duplication rates dependent on the expression levels.
FastQC FastQC is a quality control tool for high-throughput sequence data (Babraham Institute) and is developed in Java. Import of data is possible from FastQfiles, BAM or SAM format. This tool provides an overview to inform about problematic areas, summary graphs and tables to rapid assessment of data. Results are presented in HTML permanent reports. FastQC can be run as a stand-alone application or it can be integrated into a larger pipeline solution.
fastqp fastqp. Simple FASTQ quality assessment using Python.
Kraken kraken:A set of tools for quality control and analysis of high-throughput sequence data.
HTSeq HTSeq.The Python script htseq-qa takes a file with sequencing reads (either raw or aligned reads) and produces a PDF file with useful plots to assess the technical quality of a run.
mRIN mRIN - Assessing mRNA integrity directly from RNA-Seq data.
MultiQC MultiQC- Aggregate and visualise results from numerous tools (FastQC, HTSeq, RSeQC, Tophat, STAR, others..) across all samples into a single report.
NGSQC NGSQC: cross-platform quality analysis pipeline for deep sequencing data.
NGS QC Toolkit NGS QC Toolkit A toolkit for the quality control (QC) of next generation sequencing (NGS) data. The toolkit comprises user-friendly stand alone tools for quality control of the sequence data generated using Illumina and Roche 454 platforms with detailed results in the form of tables and graphs, and filtering of high-quality sequence data. It also includes few other tools, which are helpful in NGS data quality control and analysis.
PRINSEQ PRINSEQ is a tool that generates summary statistics of sequence and quality data and that is used to filter, reformat and trim next-generation sequence data. It is particular designed for 454/Roche data, but can also be used for other types of sequence.
QC-Chain QC-Chain is a package of quality control tools for next generation sequencing (NGS) data, consisting of both raw reads quality evaluation and de novo contamination screening, which could identify all possible contamination sequences.
QC3 QC3 a quality control tool designed for DNA sequencing data for raw data, alignment, and variant calling.
qrqc qrqc. Quickly scans reads and gathers statistics on base and quality frequencies, read length, and frequent sequences. Produces graphical output of statistics for use in quality control pipelines, and an optional HTML quality report. S4 SequenceSummary objects allow specific tests and functionality to be written around the data collected.
RNA-SeQC RNA-SeQCis a tool with application in experiment design, process optimization and quality control before computational analysis. Essentially, provides three types of quality control: read counts (such as duplicate reads, mapped reads and mapped unique reads, rRNA reads, transcript-annotated reads, strand specificity), coverage (like mean coverage, mean coefficient of variation, 5’/3’ coverage, gaps in coverage, GC bias) and expression correlation (the tool provides RPKM-based estimation of expression levels). RNA-SeQC is implemented in Java and is not required installation, however can be run using the GenePattern web interface. The input could be one or more BAM files. HTML reports are generated as output.
RSeQC RSeQC analyzes diverse aspects of RNA-Seq experiments: sequence quality, sequencing depth, strand specificity, GC bias, read distribution over the genome structure and coverage uniformity. The input can be SAM, BAM, FASTA, BED files or Chromosome size file (two-column, plain text file). Visualization can be performed by genome browsers like UCSC, IGB and IGV. However, R scripts can also be used to visualization.
SAMStat SAMStat identifies problems and reports several statistics at different phases of the process. This tool evaluates unmapped, poorly and accurately mapped sequences independently to infer possible causes of poor mapping.
SolexaQA SolexaQA calculates sequence quality statistics and creates visual representations of data quality for second-generation sequencing data. Originally developed for the Illumina system (historically known as “Solexa”), SolexaQA now also supports Ion Torrent and 454 data.
Trim galore! Trim_galore is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).

BOL: AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

AfterQC: Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

Comment by Jit