Can traditional numerical taxonomy methods be used
for 16S reads? (i) some species have genes that are >97% similar, giving merged OTUs containing multiple species, (ii) a single species may have paralogs that are <97% similar, causing the species to be split across two or more OTUs, and (iii) some clusters, even a majority, may be spurious due to artifacts including read errors and chimeras. Traditional methods, including rarefaction curves to assess species richness and alpha and beta diversity estimators, implicitly assume that OTUs are observations of organisms with negligible error, and that the number of observations (reads) correlates well with the total number of individuals present in the community. I believe that these methods must be modified in cases where OTUs do not reliably correspond to species or monophyletic groups, especially if OTUs with lower abundance are more likely to be artifacts. Similar considerations apply to inferences based on the RDP Classifier, which may report a chimera as a novel genus, or methods that require building a phylogenetic tree, e.g. for UniFrac, where the tree topology will be disrupted by chimeras. If a majority of OTUs are experimental artifacts, then traditional species richness estimates are not valid, and measures of between-sample variation will tend to reflect differences in artifact frequencies rather than biological differences. References |