See also
UTAX algorithm
utax command
Training for short sequences
Taxonomy parameter training
Taxonomy annotations
How to train UTAX on your own reference
data
The utax_train command performs UTAX parameter training. See how to train UTAX on your own reference data for discussion of the practical details.
Input is a reference database in FASTA format with taxonomy annotations.
The -wordlength option species the k-mer length (default is 8). You can evaluate the performance with different word lengths by looking at the report output (see below).
Trained parameters are written to the taxconfs file specified by the -taxconfsout option.
The -report option can be used to create a file with estimated sensitivity
and error rates for a range of confidence cutoffs (example below).
This report can be used to choose an appropriate confidence threshold for your data.
Training uses the -utax_trainlevels and -utax_splitlevels options. The default values are ‑utax_trainlevels dpcofg and ‑utax_splitlevels NVrdpcofg. The utax_trainlevels option specifies the taxonomy levels to be predicted by utax. See taxonomy parameter training for explanation of -utax_splitlevels. See also training for short sequences,
With the UNITE ITS taxonomy files, these options should be used: -utax_trainlevels kpcofgs ‑utax_splitlevels NVpcofgs.
Example
usearch -utax_train ref.fa -report
16s_report.txt -taxconfsout ref.tc