See also
UTAX reference data downloads
utax command
Taxonomy parameter training
Taxonomy annotations
How to train UTAX on your own reference
data
The makeudb_tax command creates a UDB database file from sequences in a FASTA file. The database file is compatible with the utax, usearch_global and usearch_local commands.
See UTAX downloads page for available reference files.
See How to train UTAX on user data if you want to use your own reference data.
The sequence labels in the FASTA file must contain taxonomy annotations.
The -wordlength option species the k-mer length (default is 8). You can evaluate the performance with different word lengths by looking at the report output (see below).
By default, parameters for utax are trained using the FASTA file. For large files, training can be time-consuming and require a lot of memory. Alternatively, parameters can be provided using the -taxconfsin option to specify the name of a taxconfs file. If you use -taxconfsin, then you must use the same word length that was used for training those parameters.
If training is performed (i.e., if -taxconfsin is not specified), then the trained parameters can be saved in taxconfs format using the ‑taxconfsout option.
The -report option can be used to create a file with estimated sensitivity and error rates for a range of confidence cutoffs (example below).
This report can be used to choose an appropriate confidence threshold for your data.
Training uses the -utax_trainlevels and -utax_splitlevels options. The default values are ‑utax_trainlevels dpcofg and ‑utax_splitlevels NVrdpcofg. The utax_trainlevels option specifies the taxonomy levels to be predicted by utax. See taxonomy parameter training for explanation of -utax_splitlevels. See also training for short sequences,
With the UNITE ITS taxonomy files, these options should be used: -utax_trainlevels kpcofgs ‑utax_splitlevels NVpcofgs.
Example
usearch -makeudb_utax 16s_ref.fa -output 16s_ref.udb -report
16s_report.txt