See also
Microbial taxonomy
Sequence databases with taxonomy classifications
Taxonomy annotation errors in large databases
Taxonomy database downloads
Use a small database with authoritative classifications
I recommend using a authoritatively classified sequences, e.g. for 16S the
most recent RDP training set or LTP release.
Taxonomy annotations in large databases are unreliable predictions
The taxonomy annotations in the large 16S databases (SILVA, Greengenes, or the full RDP database) are mostly computational predictions from 16S sequences.
Roughly one in five of these predictions are wrong, probably because the
guide trees have
pervasive branching order errors. Therefore, using annotations from large databases adds
a substantial error rate in the reference dataset on top of the intrinsic
error rate of a prediction algorithm such as SINTAX or the
Naive Bayesian Classifier. With these considerations in mind, I believe it is best to use a database of type strain
and isolate sequences rather than Greengenes, SILVA or RDP.