USEARCH manual

OTU table

An OTU table describes a set of observations which are assigned to samples and OTUs. Typically, an "observation" is a read for a given sample which is assigned to an OTU. A value in the table is a count (integer) or frequency (floating point value between 0.0 and 1.0).

Row-column representation
An OTU table is typically represented as a tabbed text file in which samples are rows and OTUs are columns. A notable exception is the mothur "shared" file format which does the opposite: rows are samples and columns are OTUs.

Sparse representation
If the number of samples is very large, many OTUs will have no observations for a given sample. A table with many zero entries is called sparse. To save disk space and processing time, a sparse matrix can be represented by giving a list of the non-zero entries only, e.g. a tabbed text file in which each line has three fields: OTU_name, Sample_name and Count.

OTU table values
The values in table can be defined in different ways.

Raw count
A raw count is the number of reads for one sample that were assigned to a given OTU. "Raw" implies the full set of original reads before any corrections or transformations such as normalizing or rarefaction.

Normalized count
Typically, different samples have different numbers of reads. The raw count for a sample may therefore be higher or lower simply because that sample had more or less reads, not because the species in that OTU are more or less abundant. Normalization attempts to correct for this. For example, if sample A has twice as many reads as sample B, the raw counts for A might be divided by two. Simple methods for normalizing are 1. takeing random subsets of the same number of reads for each sample or 2. calculating frequencies. More sophisticated methods have been described but in my opinion are not worth the trouble.

Rarified count
See rarefaction. An OTU table can be rarefied by taking a random subsample. For example, if there are 10,000 reads in the table, a rarefied table could be constructed by choosing a random subset of 1,000 reads. Another method for normalizing is to take a random subset of the same size for all samples.

Frequency
The frequency (f) of an integer count (n) is calculated by divinding by the total count for a sample (N), i.e. f = n / N.