Phred scaling is a widely used system for representing the quality scores of sequencing reads. It is used to quantify the probability of an error in each base call in a sequencing read. Here is a quick guide to Phred scaling:
Phred quality scores: Phred quality scores are represented as integers ranging from 0 to 60. The higher the score, the more confident we are in the base call. For example, a Phred score of 10 corresponds to a base call accuracy of 90%, while a Phred score of 20 corresponds to a base call accuracy of 99%.
Logarithmic scale: The Phred scale is logarithmic, which means that each increase of 10 in the Phred score corresponds to a 10-fold decrease in the probability of an error. For example, a Phred score of 20 corresponds to a probability of error of 1 in 100, while a Phred score of 30 corresponds to a probability of error of 1 in 1000.
Quality score calculation: The Phred score for each base call is calculated as follows:
Q = -10 * log10(p)
where Q is the Phred score, and p is the probability of an error in the base call.
Quality score encoding: The Phred quality scores are typically encoded in the quality score field of the FASTQ file format, which is a common format used for storing sequencing reads. Each quality score is represented as an ASCII character, with the character code equal to the Phred score plus 33. For example, a Phred score of 20 is represented by the ASCII character "!" (33 + 20).
In summary, Phred scaling is a logarithmic system for representing the quality scores of sequencing reads, where higher scores indicate greater confidence in the base call. The Phred scores are calculated based on the probability of an error in each base call and are typically encoded in the quality score field of the FASTQ file format.