See also
Random forest classifiers
OTU importance
forest_train command
otutab_forest_train command
A random forest is trained using the forest_train command or otutab_forest_train command, which generates a parameter file. Random forest parameter files are tabbed text files.
Comment lines start with a hashtag (#). Comment lines are used to report accuracy metrics. Reported metrics include:
#err = error rate.
#meanpe
= mean probability of error.
#mse = mean
squared error.
#oob_err = out-of-bag error
rate.
#oob_meanpe = out-of-bag mean
probability of error.
#oob_mse = out-of-bag
mean squared error.
Metrics ending with _w are calculated with weighting, e.g. mse_w is the mean squared error rate with category weighting, i.e. each observation is weighted by 1/n where n is the number of test observations for that category.
Lines starting with var are features (also called variables). For example:
var 6 Otu123 0.00821
Fields are:
#1. var
#2. index of the variable (0, 1, 2...)
#3.
name of the variable (typically, OTU name)
#4. importance of the variable
To extract the OTU importance values from a forest parameter file and sort
them in order of decreasing importance, you could use:
grep -w "^var" forest.txt | cut
-f3,5 | sort -rgk2