See also
FASTQ files
Average Q is a bad idea!
Expected errors
Quality filtering
The quality score of a base, also known as a
Phred or Q score,
is an integer value representing the estimated probability of an error, i.e.
that the base is incorrect. If P is the error probability, then:
P = 10–Q/10
Q = –10 log10(P)
Q scores are often represented as ASCII characters. The convention for mapping
characters to integers varies, see FASTQ options
for details. Two of the most common variants are shown in the
tables below.
What kind of error?
There is an important difference between Q scores in reads from
454 and Illumina. In effect, 454 ignores the possibility of substitution
errors and Illumina ignores indels. With 454, the Q score is the estimated
probability that the length of the homopolymer is wrong, and with Illumina the Q
score is the probability that the base call is incorrect. In the case of
Illumina, this is reasonable because indel errors are very rare. But with 454,
substitution errors are quite common, occurring with comparable frequency to
homopolymer errors. This means that 454 Q scores are not as informative as
Illumina Q scores, but are still useful in practice. See
quality filtering for further discussion.
Small Q scores
Note that a Q score of 3 means P=0.5, meaning that there is a 50% chance the
base is wrong, and lower values represent even higher probabilities of error.
Q=0 means P=1, i.e. that the base call is certainly wrong, so this is rarely
used, though might be appropriate for an undetermined base (often represented as
'N'). The lowest value
usually found in practice is Q=2 (P=0.63), which means the base is more likely
to be wrong than correct. A run of Q2s is sometimes used to
indicate the end of usable data in the read.
|