fragment size: the Illumina WGS protocol generates paired-end reads from both ends of longer fragments. The lengths of these fragments are assumed to be sampled from a normal distribution. Therefore, in the absence of structural variants,...
An algorithm is a procedure for solving a problem in terms of the actions to be executed and the order in which those actions are to be executed. An algorithm is merely the sequence of steps taken to solve a problem. The steps are normally...
The genome assemblers generally take a file of short sequence reads and a file of quality-value as the input. Since the quality-value file for the high throughput short reads is usually highly memory-intensive, only a few assemblers, best suited for...
alignment: the mapping of a raw sequence read to a location within a reference genome. The mapping occurs because the sequences within the raw read match or align to sequences within the reference genome. Alignment information is stored in the SAM...
-----------
Gene Set Clustering based on Functional annotation
GeneSCF serves as command line tool for clustering the list of genes given by the users based on functional annotation (Gene Ontology, KEGG, REACTOME and NCG...
Linux Commands Cheat Sheet File System ls — list items in current directory ls -l — list items in current directory and show in long format to see perimissions, size, an modification...
Reformat is a member of the BBMap/BBTools package. It is a multipurpose tool designed for converting reads or other nucleotide data between different formats. It supports, and can inter-convert: fastq fasta fasta+qual sam scarf (an old Illumina...
What are genomic interspersed repeats?
In the mid 1960's scientists discovered that many genomes contain stretches of highly repetitive DNA sequences ( see Reassociation Kinetics Experiments, and C-Value Paradox ). These sequences were later...