Home Software Services About Contact     

cluster_otus command

See also
  OTU / denoising pipeline
  Tutorials with data, scripts, and excercises with solutions
  Defining and interpeting OTUs
  OTU benchmark results
  Making an OTU table (otutab command)
  Should I use UPARSE or UNOISE?

The cluster_otus command performs 97% OTU clustering using the UPARSE-OTU algorithm.

Chimeras are filtered by this command. This chimera filtering is much better than using UCHIME so I do not recommend using reference-based chimera filtering as a post-processing step (except perhaps for manual review), because false positives are common.

For most purposes, I consider 97% OTU clustering obsolete. It is better to use the unoise command to recover the full set of biological sequences in the reads. These are also valid OTUs; I call them "ZOTUs" for zero-radius OTUs, to emphasize this. See defining and interpreting OTUs and the UNOISE paper for further discussions.

Input to cluster_otus is a FASTA file containing quality filtered, globally trimmed and dereplicated reads from a marker gene amplicon sequencing experiment, e.g. 16S or ITS. It is generally recommended that singleton reads should be discarded.

See OTU / denoising pipeline for discussion of how to prepare reads before clustering. It is strongly recommended that you follow the pipeline recommentations, otherwise the accuracy of the OTUs will probably be compromised.

Input sequence labels must have size annotations giving the abundance of the unique sequence. Size annotations are generated by the -sizeout option of clustering commands; typically fastx_uniques is used.

The -minsize option  can be used to specify a minimum abundance. Default value is 2, which discards singleton unique sequences.

The identity threshold is fixed at 97%. See defining and intepreting OTUs for discussion. See UPARSE OTU radius for making OTUs at different identities.

The -otus option specifies a FASTA output file for the OTU representative sequences. By default, OTUs labels are taken from the input file, with size annotations stripped.

The -relabel option specifies a string that is used to re-label OTUs. If -relabel xxx is specified, then the labels are xxx followed by 1, 2 ... up to the number of OTUs. OTU identifiers in the labels is required for making an OTU table with the otutab command.

The -uparseout option specifies a tabbed text output file documenting how the input sequences were classified.

The -uparsealnout option species a text file containing a human-readable alignment of each query sequence to its UPARSE-REF model.

Parsimony score options are supported.

Alignment parameters and heuristics are supported.


usearch -cluster_otus uniques.fa -otus otus.fa -uparseout uparse.txt -relabel Otu