See also
alpha_div
command
beta_div command
Alpha diversity metrics are calculated using the alpha_div command. It is more accurate to say that alpha_div calculates single-sample metrics because several of the metrics are not diversity metrics.
Some metrics just consider presence / absence of an OTU, e.g. richness, but most are based on OTU frequencies. Interpreting frequencies is difficult because amplification bias causes the number of reads to correlate very badly with the number of cells, so for example the OTU with highest frequency in the reads is often not the most abundant species. Because of cross-talk, even presence / absence of a given OTU in a given sample cannot be reliably established when the OTU has low abundance. Because of these issues, it is difficult to interpret diversity metrics from traditional numerical ecology when they are applied to next-generation marker gene sequencing.
Chao-1 attempts to estimate the total number of OTUs in the community including those that were not observed. In my opinion, estimators have little value in amplicon sequencing experiments because low-abundance OTUs are often spurious which makes reliable extrapolation impossible.
Confusingly, some metrics use different units so cannot be compared with each other. For example, the popular Shannon index is a measure of entropy where the unit is bits of information if the logarithms are base 2, but people sometimes use natural logarithms (base e) or base 10. None of these variants of the Shannon index have an obvious connection to the number of OTUs, and people often do not say which variant they used, so the numerical values are difficult to interpret. Metrics using unfamiliar units can be interpreted by converting to an effective number of OTUs. The effective number of OTUs for the Shannon index is the Jost index of order 1.
Diversity metricsName | Units | Description |
richness | OTUs |
Number of OTUs with at least one read for the sample. |
chao1 | OTUs |
Chao-1 estimator, calculated as N + S2
/ (2 D2) where N is the number of OTUs, S is the number of singleton
OTUs and D is the number of doublet OTUs, i.e. OTUs with abundance 2. |
shannon_2 | bits |
Shannon
index (logs to base 2). |
shannon_e | nats |
Shannon
index (logs to base e). |
shannon_10 | dits |
Shannon
index (logs to base 10). |
jost | OTUs |
Jost index of order q where q is specified by the
-jostq command-line option, default 1.5. |
jost1 | OTUs |
Jost index of order 1, the
effective number of species given by the
Shannon index. |
Name | Units | Description |
simpson | Probability |
Simpson index, calculated as the sum over OTUs of f2 where
f is the frequency of the OTU. It is the probability that two randomly
selected reads will belong to the same OTU. A value close to 1 indicates that a
single large OTU dominates the sample, small values indicate that the reads are
distributed over many OTUs. |
dominance |
Probability |
Probability that two
randomly selected reads will belong to different OTUs.
Calculated as 1 – simpson. |
equitability | ? |
Entropy (Shannon index) divided by the logarithm of the number of OTUs. Value of
1 indicates perfectly even (equal abundances), small values indicate a highly
skewed abundance distribution. |
robbins | Frequency |
Robbins index, calculated as S / (N + 1) where S is the number of singleton OTUs
and N is the total number of OTUs. |
berger_parker | Frequency |
Berger-Parker index.
Frequency of the most abundant OTU. A value close to 1 indicates that a single
large OTU dominates the sample, small values indicate that the reads are
distributed over many OTUs. |
Name | Units | Description |
reads | Reads |
Total number of reads for the sample. |