Home Software Services About Contact usearch manual
Sample pooling

I usually recommend pooling samples for OTU clustering and denoising, for the following reasons.

Cross-talk detection
To detect cross-talk using the UNCROSS algorithm, it is important to include all reads from one sequencer run, even if the samples are from different environments..

OTU clustering and denoising
For OTU clustering and denoising, it is usually better to include reads from all related samples, e.g. all samples from a given environment, even if they were sequenced in more than one run or were sequenced together with other environments (e.g., sequenced by a shared sequencing center for a different user).

Comparing samples
Creating a single set of OTUs is the most natural and intuitive basis for sample comparison, e.g. using a beta diversity metric. If you create separate OTUs for each sample, they are not directly comparable.

Improved amplicon abundance estimation and singleton detection
Samples are pooled, then a sequence that appears as a singleton in one sample may also appear in another sample. If singletons are discarded after pooling (as usually recommended in order to reduce spurious OTUs), then more low-abundance species will be retained.

Chimera detection
The UPARSE-OTU and UNOISE algorithms both require that a chimera has lower read abundance than its parents. Chimeras are not detected if a parent has the same number or fewer reads. This most often happens with low-abundance parents, e.g. when a chimera and one of its parents are both present in exactly two reads. If samples are pooled, parent abundances usually increase because they are found in multiple samples, while chimeras are only rarely reproduced so will usually be found only in a single sample. Even if chimeras are reproduced, pooling will tend to increase both chimera and parent abundances, leading to a more accurate reflection of amplicon abundance so that parent abundances become greater than their chimeras. Conversely, pooling is highly unlikely to increase the abundance of a chimera relative to its parents. Pooling is therefore effective in reducing the number of spurious OTUs due to chimeras.

Error detection
The UNOISE algorithm uses unique sequence abundances to detect bad reads. If a read (R) with low abundance that is very similar to a read with much higher abundance (H), then R is probably a bad read with correct sequence H. This is most effective when all samples are pooled together to give the highest possible abundances for correct reads.

When to pool in your pipeline
Samples should be combined after non-biological sequences such as barcodes have been stripped from the reads, and before dereplication. This is required so that dereplication reflects the abundances of unique biological sequences across all samples.