nbc_tax command
See also
sintax command
Taxonomy
Which taxonomy database should I use?
Naive Bayesian Classifier algorithm
makeudb_sintax command
sintax_summary command
Cross-validation by identity
The nbc_tax command is an implementation of the RDP Naive Bayesian Classifier algorithm (Wang et al . 2007 ) . Predictions by nbc_tax are very similar to the Java implementation provided by RDP; differences are consistent with the randomness of bootstrapping so I believe that the results are equivalent for all practical purposes..
Taxonomy predictions are written to the -tabbedout file. The first three fields are (1) query sequence label, (2) prediction with bootstrap values and (3) strand. If the -sintax_cutoff option is given then predictions are written a second time after applying the confidence threshold, keeping only ranks with high enough confidence. The sintax_summary command can be used to generate a summary report.
The - rdpout output file is compatible with the output generated by the RDP command-line version. This option is provided for the convenience of scripts already written for that program.
The strand option must be specified.
Multithreading is supported.
Example
usearch -nbc_tax otus.fa -db ref16s.fa -strand plus -tabbedout tax.txt
Reference
Wang,Q. et al. (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. AEM 73 , 5261-7.
References (please cite)
R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm
• Fast and simple method, accuracy comparable to RDP Classifier
R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates
• Genus accuracy of best methods is 50% on V4 sequences
• Recent algorithms do not improve on RDP Classifier or SINTAX
R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong
• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies