USEARCH manual

Taxonomy confidence measures

See also
SINTAX algorithm
Taxonomy benchmark home
sintax command

The SINTAX algorithm generates taxonomy predictions with confidence estimates specified as a bootstrap value.

The definition and interpretation of a taxonomy prediction confidence estimate is not as simple as it might appear. Ideally, the error rate of predictions with confidence 0.9 should be approximately 10%, but in practice the error rate depends on the query dataset and on unknown characteristics of the reference dataset. It would be nice to calculate a p-value, but this is tricky because we need two statistical models: one for the hypothesis we are testing plus a null model in which the hypothesis is false and the observation occurs by chance. The (now deprecated) UTAX algorithm implemented a method for calculating p-values, but it only works well for 16S genes and is not decisively better than SINTAX bootstrapping in practice..

Most taxonomy prediction algorithms don't provide a confidence estimate, including GAST, the default QIIME method (assign_taxonomy.py ‑m uclust) and the mothur Classify_seqs command with method=knn. A notable exception is the RDP Naive Bayesian Classifier (RDP) which reports a confidence value obtained by boostrapping. This was an important improvement over previous methods and is a good reason why RDP is currently the most widely-used algorithm for 16S taxonomy prediction. However, everyone agrees that the RDP bootstrap value should not be interpreted as indicating the probabllity that the prediction is correct (which would be 100% minus the estimated error probability). The authors claim that for 16S sequences shorter than 250nt, a bootstrap threshold of 50% gives accurate results to genus level, claiming accuracies from 79% to 100% depending on the V region (see discussion and table under "Confidence threshold" at https://rdp.cme.msu.edu/classifier/class_help.jsp). If this result is valid, the error rate at 50% bootstrap is presumably much less than 50%. However, I believe their "leave-one-out" validation seriously under-estimates error rates on real data (for discussion see validating taxonomy classifiers). On my tests, I find a 33% error rate for genus predictions by RDP on the V3-V5 segment (~530nt) at 50% bootstrap cutoff. At 100% bootstrap confidence, the error rate is 8%.

The SINTAX boostrap value has similar accuracy to RDP on V4 sequences. On ITS sequences and full-length 16S sequences, the SINTAX boostrap value is significantly better (see SINTAX paper).