USEARCH manual

UNOISE algorithm

The UNOISE algorithm performs error-correction (denoising) on amplicon reads. It is implemented in the unoise2 command.

The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads.

Correct biological sequences are recovered from the reads, resolving distinct sequences down to a single difference (usually) or two or more differences (almost always).

Errors are corrected as follows:

- Reads with sequencing error are identified and removed.
- Abundances are corrected (when the OTU table is generated).
- Chimeras are removed.
- PhiX sequences are removed.

Denoising is better than OTU clustering.
I generally consider denoising to be superior to traditional OTU clustering at 97% identity because OTUs may merge different species (or more generally, different phenotypes) with distinct sequences while denoising gives the best possible resolution.

Using denoised sequences as OTUs has two possible drawbacks: a single species may be split into two OTUs due to different strains or paralogs, and the sensitivity is slightly lower because UPARSE can make robust OTUs from unique sequences with abundance as low as 2 while the minimum abundance for UNOISE is around 4. I consider splitting of strains to be a good thing, because they may have different phonotypes and hence different ecological roles. Splitting due to paralogs is relatively benign (what does it matter?), and is not solved by clustering at 97% identity because paralogs have identities <97% in some cases. Splitting or lumping is unavoidable regardless of whether the clustering identity is 97% or 100% so I would argue that it is better to resolve as many distinct biological sequences as possible. Sensitivity to unique sequences with abundance <4 (summed over all samples) is rarely important in practice, so the sensitivity of UNOISE to low-abundance sequences not really worse than UPARSE for most purposes.

Denoised sequences are valid OTUs (the clustering identity is 100%, if you like) and can be used to generate an OTU table in just the same way as 97% OTUs.