See also
Making an OTU
table
An OTU table describes a set of observations which are assigned to samples and OTUs. Typically, an "observation" is a read for a given sample which is assigned to an OTU. A value in the table is a count (integer) or frequency (floating point value between 0.0 and 1.0).
Row-column representation
An OTU table is typically
represented as a tabbed text file in which samples are rows and OTUs
are columns. A notable exception is the
mothur "shared" file format which does the opposite: rows are samples
and columns are OTUs.
Sparse representation
If the number of samples is very
large, many OTUs will have no observations for a given sample. A table with many zero entries is called sparse. To save disk space and processing time, a sparse matrix can
be represented by giving a list of the non-zero entries only, e.g. a tabbed
text file in which each line has three fields: OTU_name, Sample_name and Count.
OTU table values
The values in
table can be defined in different ways.
Raw count
A raw count is the number of reads
for one sample that were assigned to a given OTU. "Raw" implies the
full set of original reads before any corrections or transformations such as
normalizing or rarefaction.
Normalized count
Typically, different samples
have different numbers of reads. The raw count for a sample may therefore be
higher or lower simply because that sample had more or less reads, not
because the species in that OTU are more or less abundant. Normalization
attempts to correct for this. For example, if sample A has twice as many
reads as sample B, the raw counts for A might be divided by two. Simple
methods for normalizing are 1. takeing random subsets of the same number of
reads for each sample or 2. calculating frequencies. More sophisticated
methods have been described but in my opinion are not worth the trouble.
Rarified count
See
rarefaction. An OTU table can be rarefied by taking a random subsample.
For example, if there are 10,000 reads in the table, a rarefied table could
be constructed by choosing a random subset of 1,000 reads. Another method
for normalizing is to take a random subset of the same size for all samples.
Frequency
The frequency (f) of an integer
count (n) is calculated by divinding by the total count for a
sample (N), i.e. f = n / N.