See also
Diversity
analysis
Beta
diversity
beta_div command
A beta diversity metric compares the OTU abundances (counts or frequencies) in two samples by calculating a single number designed to indicate how similar or different the samples are. Beta diversity metrics are calculated using the beta_div command.
Metric names that end with _binary are calculated based on presence or absence alone. The numerical value of the abundance is not considered; the calculation is the same except that abundance is considered to be one if the OTU is present, zero otherwise. If the name does not end with _binary then the abundance is the count (number of reads). Note that because of cross-talk, presence or absence cannot be reliably established for low-abundance OTUs, so binary metrics are generally not recommended.
For consistency and simplicity, all the supported metrics are dissimilarity measures, meaning that they are zero when samples are identical and have larger values when the samples are different. This type of measure is sometimes called a distance metric, but most of these are not distances in the strict mathematical sense because they do not satisfy the triangle inequality.
In usearch, beta diversities are always differences measures, not similarity measures, so increasing values indicate lower similarity and increasing distance. For distance measures D that ranges between zero and one there is always an equivalant similarity measure S defined by S = 1 – D, for example (Jaccard similarity) = 1 – (Jaccard distance). You can easily convert between distance and similarity measures in a spreadsheet program such as Excel.
Metrics with a range of 0 .. 1 can be used for sample clustering, i.e. to
generate a tree in which leaves are samples and more similar samples are
closer together. The Euclidean and
Manhatten distance metrics can take
arbitrarily large values so are not appropriate for clustering.
Metric | Max value | Cluster | Description |
bray_curtis | 1 | Y | Bray-Curtis |
bray_curtis_binary | 1 | Y | Bray-Curtis |
euclidean | (no maximum) | N | Euclidean distance |
jaccard | 1 | Y | Jaccard coefficient |
jaccard_binary | 1 | Y | Jaccard coefficent |
manhatten | (no maximum) | N | Manhatten distance |
unifrac | 1 | Y | UniFrac |
unifrac_binary | 1 | Y | UniFrac |