See also
Quality control
for OTU sequences
The
cluster_otus and unoise3 commands have
built-in filters for chimeras. These
filters are not perfect, so there may be cases where non-chimeric OTUs were
incorrectly discarded (false positives) or chimeric OTUs were not filtered
(false negatives).
In the past, I have suggested using reference-based UCHIME as a post-processing step to filter chimeric OTUs. This is a bad idea with the current implementations of cluster_otus and unoise3 because the error rate of reference-based UCHIME is much higher than the error rates of the built-in de novo filters. Therefore, if you run uchime2_ref on your OTUs, the OTUs that are discarded are more likely to be false positives than true chimeras. The high accuracy claims of the original UCHIME paper were exaggerated because of unrealistic benchmark tests; this is explained in the UCHIME2 paper.
It turns out that it is impossible in principle to distinguish chimeras from correct
sequences, even when there are no sequence errors and the reference database
is complete. This is a very surprising, almost shocking, result which is
reported in the UCHIME2 paper. The reason is
"fake models", where a correct sequence can be constructed as a chimera from
two other correct sequences. Chimeras can have identical sequences to valid
genes, so it is impossible for an algorithm to distinguish the two cases
from a sequence alone. Fake models are common in practice, hence the
problem.
So, what should you do? I would suggest running
uchime2_ref with the largest possible
database (which would be SILVA for 16S). Review the results with -mode
balanced and -mode high_confidence. Usually, I find that -mode
high_confidence is more useful because -mode balanced gives too many
questionable predictions. With high_confidence, you will probably see a
small number of quite convincing chimeric alignments. It is then a judgement
call whether you think these are false positives due to fake models or true
chimeras. It's anyone's guess, because it is impossible to distinguish these
two cases. If you get a lot of convincing alignments and you think this may
be due to problems with the built-in filter in cluster_otus or unoise3, then
by all means send them to me for review.