OTU identifiers in sequence labels
Making an OTU table
An
OTU table is made by running the
usearch_global command with an appropriate
output file option, e.g. otutabout. See
Mapping reads to OTUs for details.
OTU sequences must have OTU identifiers
When you run usearch_global to make the OTU table, the FASTA file
with the OTU sequences must have OTU identifiers in the
sequence labels.
OTU identifier syntax
The OTU
identifier must start with the three letters OTU (case-insensitive) and
continues to the first letter which is not alphanumeric or an underscore. The
identifier may appear anywhere in the label, it does not have to be the first
field. As a special case, if the identifier starts with otu=, the first four
characters are deleted. This means that you can use otu=xxx; annotations where
xxx is the OTU identifier, which can now be any string of characters (except
semi-colon). The following labels have OTU identifier Otu123.
>Otu123
>Otu123;size=14;
>FA87888ZZQ;Otu123;size=14;
>FA87888ZZQ;otu=Otu123;size=14;
How to get OTU identifiers in your labels
The simplest method is to use the option -relabel Otu when you run
cluster_otus. Or,
you can write your own script to relabel an existing FASTA file.
WARNING -- QIIME doesn't like underscores in OTU identifiers
Some of my older examples use OTU idenfiers like OTU_123.
Underscores in OTU identifiers can cause problems with QIIME, apparently
because the
Newick tree file standard uses underscore to mean a blank space (because
the problem only seems to occur when a tree file is used). Some USEARCH
commands only allows letters, digits and underscores in OTU identifiers, so
you can't use another punctuation symbol (e.g., a period). The safest choice
is to use Otu123.