Home Software Services About Contact usearch manual
OTU identifiers in sequence labels

OTU sequences must have OTU identifiers
An OTU table is generated by the otutab command. Database sequences must have OTU identifiers in the sequence labels.

OTU identifier syntax
The OTU identifier must start with the three letters OTU (case-insensitive) and continues to the first letter which is not alphanumeric or an underscore. The identifier may appear anywhere in the label, it does not have to be the first field. As a special case, if the identifier starts with otu=, the first four characters are deleted. This means that you can use otu=xxx; annotations where xxx is the OTU identifier, which can now be any string of characters (except semi-colon). The following labels have OTU identifier Otu123.

>Otu123
>Otu123;size=14;
>FA87888ZZQ;Otu123;size=14;
>FA87888ZZQ;otu=Otu123;size=14;

How to get OTU identifiers in your labels
The simplest method is to use the option -relabel Otu when you run cluster_otus or unoise3. Or, you can write your own script to relabel an existing FASTA file.

WARNING -- QIIME doesn't like underscores in OTU identifiers
Some of my older examples use OTU idenfiers like OTU_123. Underscores in OTU identifiers can cause problems with QIIME, apparently because the Newick tree file standard uses underscore to mean a blank space (because the problem only seems to occur when a tree file is used). Some USEARCH commands only allows letters, digits and underscores in OTU identifiers, so you can't use another punctuation symbol (e.g., a period). The safest choice is to use Otu123.