www.encodeproject.org - The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse...
github.com - Pollux: General-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and...
github.com - HECIL—Hybrid Error Correction with Iterative Learning—a hybrid error correction framework that determines a correction policy for erroneous long reads, based on optimal combinations of decision weights obtained from short read...
bitbucket.org - SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.
Reads are simulated from both strands of a provided or randomly generated reference sequence.
The reference...
sfu-compbio.github.io - SCALCE (/skeɪlz/, a.k.a. boosting Sequence Compression Algorithms using Locally ConsistentEncoding) is a tool for compressing FASTQ files. It is designed specifically for the Illumina-generated FASTQ files, but supports any...
hoffmann.bioinf.uni-leipzig.de - segemehl is a software to map short sequencer reads to reference genomes. Unlike other methods, segemehl is able to detect not only mismatches but also insertions and deletions. Furthermore, segemehl is not limited to a specific read length and is...
cutadapt.readthedocs.io - Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the...
There are several ways you can convert fastq to fasta sequences. Some methods are listed below.
Using SED
sed can be used to selectively print the desired lines from a file, so if you print the first and 2rd line of every 4 lines, you get the...
github.com - Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the...