Home Software Services About Contact     

FAQ: Should you use UPARSE or UNOISE?

There are two different ways to make OTUs: 97% clustering and denoising.

The UPARSE algorithm makes 97% OTUs.

The UNOISE algorithm does denoising, i.e. error-correction.

If UPARSE works perfectly, it will give you a subset of the correct biological sequences in your reads such that no two sequences are >97% identical to each other. It is implemented in the cluster_otus command.

If UNOISE works perfectly, it will give you all the correct biological sequences in the reads. It is implemented in the unoise3 command.

(Of course, no algorithm is perfect so you should expect some mistakes).

The pipeline for running UPARSE and UPARSE is essentially the same,, the only difference is whether you run cluster_otus or unoise3 as the clustering step.

Once you have made an OTU table, you can proceed with diversity analysis etc. in the same way, regardless of whether you used UPARSE or UNOISE.

Which should you choose? I suggest you try both. If a biological conclusion is different, then you should worry that neither result is trustworthy and try to understand why this happens. If both methods agree, that tends to confirm the result.

Pros and cons
As of the time of writing in 2017, most published papers use 97% clustering, so this will be easier to explain to your PI and to referees. The main disadvantage of 97% clustering is that you discard some correct biological sequences that are present in your reads. If these represent strains or species with a different phenotype, then you lose relevant information and the corresponding reads will be lumped together into one OTU that contains multiple phenotypes.

The main disadvantage of denoising is that species often have variations between individuals and paralogs that are not 100% identical. Another disadvantage is that more low-abundance sequences are lost: with UPARSE, singleton uniques are discarded, but with UNOISE uniques with abundance <8 are discarded. For typical studies, this shouldn't make much difference because samples should be pooled, so a sequence with abundance <8 will probably be a singleton in a few samples and singletons in the OTU table should not be considered significant because they could be spurious with any method.

The main advantage of denoising is that you get better resolution by keeping all the biological sequences. If you have intra-species variations in the region that you sequenced, then you will get two or more OTUs for one species. For most purposes, this really doesn't matter -- it might even be better if this enables you to detect strains with different phenotypes -- so if I have to recommend one method, then I would recommend denoising.