unoise3 command

Uses the UNOISE algorithm to perform denoising (error-correction) of amplicon reads.

Errors are corrected as follows:
- Reads with sequencing error are identified and corrected.
- Chimeras are removed.

Input is a set of quality-filtered unique read sequences with size=nnn; abundance annotations . See OTU / denoising pipeline for details of how reads should be pre-processed and how other types of errors and artifacts can be removed.

The input file must be sorted by decreasing abundance, i.e. by decreasing value of the size=nnn annotation. The can be done using the sortbysize command .

The algorithm is designed for Illumina reads, it does not work as well on 454, Ion Torrent or PacBio reads .

Predicted correct biological sequences are written to the -zotus file in FASTA format. Labels are formatted as Zotu nnn where nnn is 1, 2, 3...

Predicted correct amplicon sequences are written to the -ampout fle in FASTA format. These include chimeras, so this output file is not generally needed in a production pipeline. Labels are formatted as Amp nnn ;uniq= Uniqlabel ;uniqsize= u ;size= s ; where nnn is 1, 2, 3..., Uniqulabel is the label in the input file, truncated at the first semi-colon, u is the size= annotation from the input file and s is the total size of reads derived from this amplicon.

An OTU table can be generated using the otutab command . See OTU / denoising pipeline .

The -minsize option specifies the minimum abundance (size= annotation). Default is 8. Input sequences with lower abundances are discarded. Most of the low-abundance sequences are usually noisy and are be mapped to a ZOTU by the otutab command . For higher sensivity, reducing minsize to 4 is reasonable, especially if samples are denoised indivudually rather pooling all samples together , as I would usually recommend. With smaller minsize, there tends to be more errors in the predicted low-abundance biological sequences.

The -tabbedout option specifies a tabbed text filename which reports the processing done for each sequence, e.g. if it is classified as noisy or chimeric.

The -unoise_alpha option specifies the alpha parameter (see UNOISE2 paper for definition). Default is 2.0.

Example

usearch -unoise3 uniques.fa -zotus zotus.fa -tabbedout unoise3.txt