Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.


 New in v11 

otutab_forest_train command

See also
  Random forest classifiers
  Feature table file
  forest_classify command

The otutab_forest_train command is used to train the parameters of a random forest classifier on samples in a OTU table with known categories specified in a metadata file.

A metadata file must be specified using the -meta option.

The -randseed option specifies a random number seed. The value must be a non-negative integer. By default, the seed is randomized using the time of day and operating system process id so that it will almost always be different each time the command is executed. This option can be used to get reproducible results, e.g. -randseed 1.

The -trees option specifies the number of trees in the forest. Default 100. Increasing the number of trees may improve accuracy on unusually complex datasets at the expense of slower execution times for training and classification. In my experience, 100 is enough for typical 16S experiments. You can check by comparing the training accuracy with different numbers of trees.

A forest is trained and the forest parameters are saved to the file specified by the -forestout option. This forest can be used to predict categories for novel data using the forest_classify command or otutab_forest_classify command.


usearch -otutab_forest_train otutab.txt -meta meta.txt -forestout forest.txt