Read preparation: discard singletons

Reads with singleton sequences are likely to have errors, and singletons should therefore be discarded. Discarding singletons is therefore related to quality filtering. In practice, mock community tests show that both expected error filtering and discarding singletons is necessary to achieve reasonably low rates of spurious OTUs.

Usually, most singletons will map to a OTU when the OTU table is constructed, so the data is not lost.

With denoising, a higher abundance threshold is needed because the signal for a correct read is an abundant sequence surrounded by a "cloud" of much lower-abundance sequences (see UNOISE algorithm). Correct sequences with very low abundance will have fragmentary or missing clouds and therefore cannot be reliably recognized. This is one reason why it is recommended to pool reads for all samples together, which will tend to give much higher abundances for correct sequences.

In a typical pipeline, it is not necessary to explicitly discard singletons or low-abundance sequences because the cluster_otus and unoise3 commands have their own abundance thresholds, specified by the -minsize option. This defaults to 2 for cluster_otus and 8 for unoise3.

If you want to explicitly discard low-abundance sequences, you can use the minuniquesize option of fastx_uniques or the minsize option of sortbysize.