See also
fasta_diversity command
Multiple-sample diversity metrics (beta
diversity)
Single-sample (alpha) diversity
A single-sample diversity metric attempts to capture the intuitive notion of
"diversity" by calculating a single number from one set of observations of
individuals. The individuals must be assigned to a group, e.g., species or OTU.
Here, I will call the groups "OTUs" and individuals "reads". The number of reads
assigned to a given OTU is the abundance of the OTU.
Diversity index
A diversity index is a metric that characterizes the OTUs that were observed
without extrapolating to consider rare OTUs that were not observed due to
sampling. The simplest example is richness, which is the number of OTUs that
were observed. More sophisticated diversity metrics consider abundances so that
high-abundance OTUs are weighted differently from low-abundance OTUs.
Diversity estimator
A diversity estimator is a metric that attempts to extrapolate to account
for rare OTUs that were missed due to sampling. Estimators make mathematical
assumptions about the shape of the tail of the abundance distribution.
Richness index
Richness (Wikipedia)
is the simplest diversity index; it is just the number of OTUs.
Simpson index
The Simpson index (Wikipedia)
is the probability that two individuals taken at random from the sample belong
to the same OTU.
Shannon index
The Shannon index (Wikipedia)
is also known as Shannon entropy, the Shannon-Wiener index and the
Shannon-Weaver index. It is a fundamental quantity in information theory that
can be interpreted as the amount of uncertainty inherent in the abundance
distribution. If there are many OTUs with equal abundances, the entropy is
maximized because it is hard to predict which OTU you would find by randomly
picking a read. On the other hand, if all the reads belong to one large OTU,
then the entropy is minimized because there is no uncertainly about which OTU
you will pick.
Jost index (effective number of species)
The Jost index calculates an effective number
of OTUs. The index has a parameter (q) which determines how abundance is
weighted.
Chao1 estimator
The Chao1 estimator is popular, but in my opinion it should not be used with
OTUs obtained by clustering NGS reads. It is calculated as Chao1 = N +
S / (2 D), where N = nr OTUs, S = nr singletons and
D = nr doublets (OTUs with abundance 2). The problem with this metric is
that spurious OTUs due to sequencing and PCR errors are strongly biased towards
low abundance, so we expect S and D to be overestimated, but we
don't know by how much. See discussion of singletons.