Reads with singleton sequences are likely to have errors, and
singletons should therefore be discarded.
Discarding singletons is therefore related to
quality filtering. In practice, mock
community tests show that both expected error filtering and discarding
singletons is necessary to achieve reasonably low rates of spurious OTUs.
Usually, most singletons will map to a OTU when the
OTU table is constructed, so the data is not lost.
With
denoising, a higher abundance threshold is needed because the signal
for a correct read is an abundant sequence surrounded by a "cloud" of
much lower-abundance sequences (see
UNOISE algorithm). Correct
sequences with very low abundance will have fragmentary or missing clouds and
therefore cannot be reliably recognized. This is one reason why it is
recommended to
pool reads for all samples
together, which will tend to give much higher abundances for correct
sequences.
In a typical pipeline, it is
not necessary to explicitly discard singletons or low-abundance sequences
because the
cluster_otus and
unoise3 commands have their own abundance
thresholds, specified by the -minsize option. This defaults to 2 for
cluster_otus and 8 for unoise3.
If you want to explicitly
discard low-abundance sequences, you can use the minuniquesize option of
fastx_uniques or the minsize option of
sortbysize.