Random forest parameter file
See also
Random forest classifiers
OTU importance
forest_train command
otutab_forest_train command
A random forest is trained using the forest_train command or otutab_forest_train command , which generates a parameter file. Random forest parameter files are tabbed text files.
Comment lines start with a hashtag (#). Comment lines are used to report accuracy metrics. Reported metrics include:
#err = error rate.
# meanpe = mean probability of error.
# mse = mean squared error.
# oob_err = out-of-bag error rate.
# oob_meanpe = out-of-bag mean probability of error.
# oob_mse = out-of-bag mean squared error.
Metrics ending with _w are calculated with weighting, e.g. mse_w is the mean squared error rate with category weighting, i.e. each observation is weighted by 1/n where n is the number of test observations for that category.
Lines starting with var are features (also called variables). For example:
var 6 Otu123 0.00821
Fields are:
#1. var
#2. index of the variable (0, 1, 2...)
#3. name of the variable (typically, OTU name)
#4. importance of the variable
To extract the OTU importance values from a forest parameter file and sort them in order of decreasing importance, you could use:
grep -w "^var" forest.txt | cut -f3,5 | sort -rgk2