Home Software Services About Contact usearch manual


Making an OTU table (mapping reads to OTUs)

See also
  OTU clustering
  UPARSE pipeline
  cluster_otus command

Output from cluster_otus is a FASTA file containing OTU representative sequences. Further analysis often requires an OTU table, which requires assigning reads to OTUs.

I recommend creating OTUs from pooled samples, i.e. by concatenating reads for all samples that were sequenced in the same run. This is important for getting the best detection of chimeras and cross-talk, and for getting the best sensitivity to low-abundance sequences that could be lost if individual samples or subsets of samples are clustered separately.

One method for assigning a read to an OTU is to find the OTU representative sequence with highest identity with the read, noting that there may be ties in which case the assignment is ambiguous. This is a database search task: reads are query sequences and the OTU representative sequences are the database to be searched. A threshold of 97% is typically used. Reads which do not map to an OTU with this identity are discarded.

The usearch_global command supports generating OTU tables using the options described below.

Sequence labels must have sample identifiers (input set) and OTU identifiers (database) as explained later in this page. This means that you cannot use the input file to cluster_otus for this step because several samples often have the same unique sequence, so the dereplicated (unique) sequence labels either do not have a sample identifier, or have a misleading sample identifier because the same sequence may be found in other samples. The way to deal with this is usually to go back to the "raw" reads after merging or truncating to a fixed length. See sample identifiers for ways to add sample identifiers to the read labels.

-otutabout filename
     QIIME classic tabbed text format.

-biomout filename
   
BIOM v1.0 format (JSON). The biom utility can be used to convert to BIOM v2.1 format (HDF5).

-mothur_shared_out filename
    Mothur "shared" file.

The OTU sequences must have OTU identifiers in the labels
See OTU identifiers for details.

Reads must have sample identifiers in the labels
See sample identifiers for details.

Singletons and low-quality reads
You can (probably should) include singletons and reads which did not pass the quality filter. If they are 97% similar to an OTU sequence, they are probably good enough to count even if they do have some sequencer or PCR error.

Reads should be trimmed
The reads should be trimmed in the same way (if any) as the input sequences you used for cluster_otus.

Typical command to generate an OTU table
With correctly formatted labels, the OTU table is generated using a command like this.

usearch -usearch_global reads.fa -db otus.fa -strand plus -id 0.97 -otutabout otu_table.txt \
  -biomout otu_table.json