Quality (Phred) scores

See also
FASTQ files
Average Q is a bad idea!
Expected errors
Quality filtering

The quality score of a base, also known as a Phred or Q score, is an integer value representing the estimated probability of an error, i.e. that the base is incorrect. If P is the error probability, then:

P = 10^–Q/10

Q = –10 log₁₀(P)

Q scores are often represented as ASCII characters. The convention for mapping characters to integers varies, see FASTQ options for details. Two of the most common variants are shown in the tables below.

What kind of error?
There is an important difference between Q scores in reads from 454 and Illumina. In effect, 454 ignores the possibility of substitution errors and Illumina ignores indels. With 454, the Q score is the estimated probability that the length of the homopolymer is wrong, and with Illumina the Q score is the probability that the base call is incorrect. In the case of Illumina, this is reasonable because indel errors are very rare. But with 454, substitution errors are quite common, occurring with comparable frequency to homopolymer errors. This means that 454 Q scores are not as informative as Illumina Q scores, but are still useful in practice. See quality filtering for further discussion.

Small Q scores
Note that a Q score of 3 means P=0.5, meaning that there is a 50% chance the base is wrong, and lower values represent even higher probabilities of error. Q=0 means P=1, i.e. that the base call is certainly wrong, so this is rarely used, though might be appropriate for an undetermined base (often represented as 'N'). The lowest value usually found in practice is Q=2 (P=0.63), which means the base is more likely to be wrong than correct. A run of Q2s is sometimes used to indicate the end of usable data in the read.