Reads with singleton sequences are likely to have errors, and
singletons should therefore be discarded. Discarding singletons is therefore related to
quality filtering. In practice, mock community tests show that both expected error filtering and discarding singletons is necessary to achieve reasonably low rates of spurious OTUs.
Usually, most singletons will map to a OTU when the
OTU table is constructed, so the data is not lost.
With
denoising, a higher abundance threshold is needed because the signal for a correct read is an abundant sequence surrounded by a "cloud" of much lower-abundance sequences (see
UNOISE algorithm). Correct sequences with very low abundance will have fragmentary or missing clouds and therefore cannot be reliably recognized. This is one reason why it is recommended to
pool reads for all samples together, which will tend to give much higher abundances for correct sequences.
In a typical pipeline, it is not necessary to explicitly discard singletons or low-abundance sequences because the
cluster_otus and
unoise3 commands have their own abundance thresholds, specified by the -minsize option. This defaults to 2 for cluster_otus and 8 for unoise3.
If you want to explicitly discard low-abundance sequences, you can use the minuniquesize option of
fastx_uniques or the minsize option of
sortbysize.