See also
sintax command
Microbial taxonomy
Cross-validation by identity
The SINTAX algorithm predicts the taxonomy of marker gene reads such as 16S or ITS. It is implemented in the sintax command.
Bootstrap confidence values are provided for all predicted ranks.
The algorithm is similar to the RDP Naive Bayesian Classifier algorithm except that k-mer similarity is used to identify the top taxonomy rather than Bayesian posteriors so there is no need for training.
Unlike the RDP Classifier, SINTAX does not require that the lowest ("training") rank be specified for all reference sequences which allows the use of large databases as a reference.
However, I do not recommend using SILVA or
Greengenes as a taxonomy reference because these databases
have high error rates -- roughly one in five of the taxonomy annotations are
wrong.