OTUs with radius >3%
The cluster_otus command command has an
otu_radius_pct option for specifying a radius different from the default of 3%.
However, please note that it is not
recommended to use larger values. This is because chimera detection
is an integral part of the clustering algorithm. Each input sequence is run
through UPARSE-REF using the current set of OTUs
as a reference database. If the optimal model is chimeric, the sequence is
discarded. If an OTU radius > 3% is used, then chimera detection becomes more
difficult because more true biological sequences will also be discarded when
they don't create new OTUs. The set of OTU sequences becomes sparser, and the
correct parents of a chimera will more often be missing from the OTU database.
Chimeras can still be detected when there are OTUs which are sufficiently close
to their parents, but the false negative rate will tend to increase. I therefore
recommend a different procedure rather than using the otu_radius_pct option.
Recommended procedure for larger OTU radius
I have not tested OTU pipelines with OTU radius different from 3%, so these
ideas are preliminary. If this is important to you, then I would welcome a
discussion and will be glad to work with you to help you analyze your particular
data -- you're welcome to email me.
The basic idea is to make a set of OTUs using cluster_otus at a small radius, then use UCLUST to re-cluster at a higher radius. For example, if you want OTUs at a radius of 4%, then you could use cluster_otus at 2% then cluster_smallmem with -id 0.96 (corresponding to a radius of 4%)
usearch -cluster_otus sorted.fa -otu_radius_pct 2 -otus otus98.fa
usearch -cluster_smallmem otus98.fa -id 0.96 -centroids otus.fa