Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.


 New in v11 

forest_train command

See also
  Random forest classifiers
  Feature table file
  forest_classify command
  OTU importance

The forest_train command is used to train the parameters of a random forest classifier on observations in a feature table with known categories.

The -randseed option specifies a random number seed. The value must be a non-negative integer. By default, the seed is randomized using the time of day and operating system process id so it will be different each time the command is executed. This option can be used to get reproducible results, e.g. -randseed 1.

The -trees option specifies the number of trees in the forest. Default 100. Increasing the number of trees may improve accuracy on unusually complex datasets at the expense of slower execution times for training and classification. In my experience, 100 is enough for typical 16S experiments. You can check by comparing training accuracy with different numbers of trees.

A forest is trained on the complete feature table and the forest parameters are saved to a file specified by the -forestout option. This forest can be used to predict categories for novel data using the forest_classify command or otutab_forest_classify command.


usearch -forest_train feature_table.txt -trees 50 -forestout forest.txt