Video talks on 16S data analysis posted.

URMAP ultra-fast read mapper posted (paper).

~20% of taxonomy annotations in SILVA and Greengenes are wrong (paper).

Taxonomy prediction is <50% accurate for 16S V4 sequences (paper).

97% OTU threshold is wrong for species, should be 99% for full-length 16S, 100% V4 (paper).

forest_kfold command

Performs k-fold cross-validation of random forest classifiers on a feature table with known categories.

For K iterations, a classifier is trained and its accuracy measured by splitting the data into a test set and training set.

The number of iterations k is given by the -tries option. Default 6.

By default, the test set is a random subset size 1/K of the observations, and the training set is the remaining (K - 1)/K observations. This can be changed by the -testpct option which specifies the size of the test set as a percentage. For example, using -tries 5 -testpct 10 will perform five iterations where the test set is 1/10th of the observations.

The -tabbedout option specifies a k-fold validation tabbed output file.

Example

usearch -forest_kfold feature_table.txt -tabbedout results.txt