See also
UTAX reference data downloads
utax command
cluster_otus_utax command
makeudb_utax command
Taxonomy predictions
Taxonomy confidence
Taxonomy training
Taxonomy benchmark results
UTAX is an algorithm for taxonomy assignment which is
implemented in the utax command. The
cluster_otus_utax command generates
OTUs based on taxa predicted by UTAX.
The main advantages of UTAX over previous classifiers such as the RDP Naive Bayesian Classifier (RDP) are very high speed, informative confidence values and flexible options for training on user-supplied data.
The algorithm is currently not published. See Validating Taxonomy Classifiers for the method I used to validate its accuracy compared with other algorithms.
At a high level, UTAX is a k-mer based method which looks
for words in common between the query sequence and reference sequences with
known taxonomy. A score calculated from word counts is used to estimate a
confidence value for each taxonomic level. Confidence values are
trained to give a
realistic estimate of error rates, in contrast to the bootstrap values
reported by RDP which are poor predictors of
error rates in practice.