FAQ: Should I increase the expected error threshold for long reads?

A maximum expected error threshold of 1 means that the most probable number of errors is zero, regardless of the read length. I would recommend using this threshold unless you have a good reason to change it. A common objection is that too many reads are discarded, but assuming you are doing OTU analysis, you should find that most of the discarded reads are recovered when you map the unfiltered reads to your OTUs using the otutab command.

If this doesn't happen, you may need to consider other strategies such as truncating the reads to reduce the error rate.

Another question to consider is whether you follow my recommendation to discard singletons before OTU clustering. If you do discard singletons, this should take care of a large majority of the "harmful" reads in the tail of the distribution, i.e. those with >3% errors. In that case, you could try using higher expected error thresholds. Suppose you get more OTUs. This could be a good thing (higher sensitivity) or a bad thing (most of the new OTUs are spurious). How could you distinguish these two situations? If you have a lot of spurious OTUs, how would this impact the biological questions you are trying to answer? See OTU quality control for more discussion.