See also
OTU clustering
SSU metagenomics
What is an OTU?
Finding species in SSU reads
Species abundance estimates
Constructing OTUs by database matching
Ideally, given a set of 16S reads, we would like to identify all known and
novel species. In practice, this is very challenging owning to
several complications. Typically, many reads
do not match a reference database well enough to allow a species assignment.
Identity threshold for species assignment
Traditionally, a 97% match has been considered sufficient for species
assignment in 16S sequences, though it should be noted that this is only
approximate: sometimes two different species have identical 16S sequences, and
conversely a single species may have two copies of the 16S gene that differ by
more than 97%. With shorter reads, the 97% cutoff approximation becomes
worse.
Constructing OTUs by de novo clustering
Usually, the best we can do with unmatched reads is to cluster them into groups that
are 97% similar. For consistency, database matching is often done after
clustering so that some OTUs are assigned to species and others are flagged as
novel or unknown. Some of these clusters may contain reads of PCR artifacts such
as undetected chimeras, and others may be
due to gene duplications in known or novel species.
Do not expect a one-to-one correspondence between
OTUs and species
Due to the complications discussed above, we cannot expect a 1:1
correspondence between OTUs and species. At best, we can aim for a 1:1
correspondence between OTUs and unique copies of the 16S gene, though this ideal
is undermined by experimental error that is hard or impossible to eliminate,
including sequencing errors and PCR artifacts.
|