See also
Random
forest classifiers
Feature table file
forest_classify command
OTU importance
The
forest_train command is used to train the parameters of a
random forest classifier on observations
in a feature table with known
categories.
The -randseed option specifies a random number seed. The value must be a non-negative integer. By default, the seed is randomized using the time of day and operating system process id so it will be different each time the command is executed. This option can be used to get reproducible results, e.g. -randseed 1.
The -trees option specifies the number of trees in the forest. Default 100. Increasing the number of trees may improve accuracy on unusually complex datasets at the expense of slower execution times for training and classification. In my experience, 100 is enough for typical 16S experiments. You can check by comparing training accuracy with different numbers of trees.
A forest is trained on the complete feature table and the forest parameters are saved to a file specified by the -forestout option. This forest can be used to predict categories for novel data using the forest_classify command or otutab_forest_classify command.
Example
usearch -forest_train feature_table.txt -trees 50 -forestout
forest.txt