Home Software Services About Contact usearch manual
Operational Taxonomic Units (OTUs)

 
See also
  SSU metagenomics
  16S OTUs

Operational Taxonomic Units (OTUs)
In traditional numerical taxonomy (Sokal and Sneath, 1963; Sneath and Sokal, 1973), an Operational Taxonomic Unit (OTU) is a term that means "the thing(s) being studied". The definition is intentionally vague. The "thing(s)" could be an individual organism, a named taxonomic group such as a species or genus, or a group with undetermined evolutionary relationships that share a given set of observed characters. It is up to a scientist to specify and justify his or her definition of OTUs in the context of a particular study.

Can traditional numerical taxonomy methods be used for 16S reads?
Methods from numerical taxonomy are often applied to next-generation marker gene sequencing studies, where organisms are not directly observed. An OTU is typically defined as a cluster of reads with 97% similarity, motivated by the expectation that these correspond approximately to species. This is reasonable providing that downstream analysis takes into account that the correspondence of OTUs with species may fail because:

(i) some species have genes that are >97% similar, giving merged OTUs containing multiple species,

(ii) a single species may have paralogs that are <97% similar, causing the species to be split across two or more OTUs, and

(iii) some clusters, even a majority, may be spurious due to artifacts including read errors and chimeras.

Traditional methods, including rarefaction curves to assess species richness and alpha and beta diversity estimators, implicitly assume that OTUs are observations of organisms with negligible error, and that the number of observations (reads) correlates well with the total number of individuals present in the community. I believe that these methods must be modified in cases where OTUs do not reliably correspond to species or monophyletic groups, especially if OTUs with lower abundance are more likely to be artifacts. Similar considerations apply to inferences based on the RDP Classifier, which may report a chimera as a novel genus, or methods that require building a phylogenetic tree, e.g. for UniFrac, where the tree topology will be disrupted by chimeras. If a majority of OTUs are experimental artifacts, then traditional species richness estimates are not valid, and measures of between-sample variation will tend to reflect differences in artifact frequencies rather than biological differences.

References
Sokal, PHA and Sneath, RR (1963), Principles of Numerical Taxonomy, San Francisco: W.H. Freeman..
Sneath, RR and Sokal, PHA (1973), Numerical Taxonomy, San Francisco: W.H. Freeman.