A maximum expected error threshold
of 1 means that the most probable number of errors is zero, regardless of the
read length. I would recommend using this threshold unless you have a good
reason to change it. A common objection is that too many reads are discarded,
but assuming you are doing OTU analysis, you should find that most of the
discarded reads are recovered when you
map the
unfiltered reads to your OTUs using the otutab command.
If this doesn't happen, you may need to consider other strategies such as
truncating the reads to reduce the error rate.
Another question to consider is whether you follow my recommendation to
discard singletons
before OTU clustering. If you do discard singletons, this should take care of a
large majority of the "harmful" reads in the tail of the distribution, i.e.
those with >3% errors. In that case, you could try using higher expected error
thresholds. Suppose you get more OTUs. This could be a good thing (higher
sensitivity) or a bad thing (most of the new OTUs are spurious). How could you
distinguish these two situations? If you have a lot of spurious OTUs, how would
this impact the biological questions you are trying to answer? See
OTU quality control for more discussion.