USEARCH manual

utax_train command

The utax_train command performs UTAX parameter training. See how to train UTAX on your own reference data for discussion of the practical details.

Input is a reference database in FASTA format with taxonomy annotations.

The -wordlength option species the k-mer length (default is 8). You can evaluate the performance with different word lengths by looking at the report output (see below).

Trained parameters are written to the taxconfs file specified by the -taxconfsout option.

The -report option can be used to create a file with estimated sensitivity and error rates for a range of confidence cutoffs (example below).

This report can be used to choose an appropriate confidence threshold for your data.

Training uses the -utax_trainlevels and -utax_splitlevels options. The default values are ‑utax_trainlevels dpcofg and ‑utax_splitlevels NVrdpcofg. The utax_trainlevels option specifies the taxonomy levels to be predicted by utax. See taxonomy parameter training for explanation of -utax_splitlevels. See also training for short sequences,

With the UNITE ITS taxonomy files, these options should be used: -utax_trainlevels kpcofgs ‑utax_splitlevels NVpcofgs.

Example

usearch -utax_train ref.fa -report 16s_report.txt -taxconfsout ref.tc