A maximum expected error threshold
of 1 means that the most probable number of errors is zero, regardless of the
read length. I would recommend using this threshold unless you have a good
reason to change it. A common objection is that too many reads are discarded,
but assuming you are doing OTU analysis, you should find that most of the
discarded reads are recovered when you
map the
unfiltered reads to your OTUs.
If this doesn't happen, you may need to consider other strategies such as
truncating the reads to reduce the error rate.
Another question to consider is whether you follow my recommendation to
discard singletons
before OTU clustering. If you do discard singletons, this should take care of a
large majority of the "harmful" reads in the tail of the distribution, i.e.
those with >3% errors. In that case, you could try using higher expected error
thresholds. Suppose you get more OTUs. This could be a good thing (higher
sensitivity) or a bad thing (most of the new OTUs are spurious). How could you
distinguish these two situations? If you have a lot of spurious OTUs, how would
this impact the biological questions you are trying to answer?
The best way to check error rates is to use a control sample such as a mock
community, but most people don't sequence a control sample so this check may not
be available. In that case, I prefer to be conservative because most analysis
pipelines produce large numbers of spurious OTUs.