See also
UNOISE paper
Uses the UNOISE algorithm to perform denoising (error-correction) of amplicon reads.
Note: in version 9.1, the unoise command was replaced by the unoise2 command. This reflects that unoise uses the older UNOISE algorithm (Edgar & Flyvbjerg 2015) while unoise2 uses the improved UNOISE2 algorithm (Edgar 2016).
Input is a set of quality-filtered unique read sequences with size=nnn; abundance annotations. See UNOISE pipeline for details of how reads should be pre-processed. The input should be a complete set of reads without any clustering (except for finding uniques), so for example you should not use 97% OTUs as input. It is ok to run unoise on the FASTQs for a single sample, though I generally recommend pooling samples before denoising.
See Tutorials for example scripts & data.
Errors are corrected as follows:
- Reads with sequencing error are identified and
removed.
- Chimeras are removed.
-
PhiX sequences are removed.
- Low-complexity sequences due to
Illumina artifacts are removed.
The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads.
Corrected amplicon sequences are written to the -fastaout file.
The -relabel prefix option specifies a prefix for sequence labels in the output file. An integer 1, 2, 3... is appended to the prefix (requires v9.0.2140 or later).
The -minampsize option specifies the minimum abundance (size= annotation) for an error-corrected amplicon. Default is 4 in v9.0.2159 and later (it was 8 in previous versions).
An OTU table can be generated using the usearch_global command. Reads must have sample identifiers in the labels. I suggest using 97% identity for matching reads to denoised amplicon sequences (this is not an clustering identity; rather, using 97% allows for up to 3% read errors).
Example
usearch -unoise uniques.fa -tabbedout out.txt -fastaout
denoised.fa