Read quality filters often use an average Q score to determine if the read has high or low quality. This is a really bad idea! The average Q score is not a good indicator of the number of errors predicted by the individual Q scores. This is illustrated by the example in the table below, which describes two reads of length 150nt.
Q scores in read | Avg. Q | Expected number of errors |
140 x Q35 + 10 x Q2 | 33 | 6.4 ! |
150 x Q25 | 25 | 0.5 |
Notice that the read with higher average Q has a much larger number of expected errors due to the Q2 bases, which have an error probability of 0.5. With P=0.5, we expect about half of the Q2 bases to be wrong, so the expected number of errors in the read is at least 5. As this example shows, if there are a few low Q scores in a read with generally high Q scores, then the average Q is a very poor indicator of the expected accuracy of the read.