USEARCH manual

UNOISE pipeline

A UNOISE pipeline recovers biological sequences from an amplicon sequencing experiment by performing error-correction (denoising) of Illumina reads. UNOISE is not designed for other sequencing technologies, e.g. 454 pyrosequencing reads. The UNOISE algorithm is implemented in the unoise command.

See Tutorials for example scripts & data.

Reads in FASTQ format
I strongly recommended starting from "raw" reads, i.e. the reads originally provided by the sequencing machine base-calling software. You should do quality filtering with USEARCH rather than using reads that have already been filtered by third-party software.

Reads in FASTA format
The unoise command supports reads in FASTA format. You may need to do this if your reads have already been quality filtered by some other method and you don't have access to the original FASTQ reads.

Sample pooling
I recommend combining reads from as many samples as possible. See sample pooling for discussion.

Read quality filtering
Quality filtering of the reads should be done using USEARCH because maximum expected error filtering method is much more effective at suppressing reads with high error rates than other filters, e.g. those based on average Q scores. Using a maximum expected errors of 1.0 is a good default choice (-fastq_maxee 1.0 option to fastq_filter or fastq_merge_maxee 1.0 option of fastq_mergepairs). You can use fastx_learn to estimate the error rate after filtering.

Global trimming
You should trim reads to a fixed length unless the sequences are contigs generated by a paired read assembler, in which case it may not be necessary. You should also trim any primer-binding sequences at the ends of the reads. See global trimming for discussion.

Unique sequences
Get the set of unique sequences with abundances using the fastx_uniques command with the -sizeout option. This will be the input file for the unoise command.

Creating an OTU table
Denoised sequences are valid OTUs (the clustering identity is 100%, if you like) and can be used to generate an OTU table in just the same way as 97% OTUs. Reads must have sample identifiers for this to work. The simplest way to do this is usually to use the -relabel @ option of fastq_filter or fastq_mergepairs.

Example commands
For typical Illumina reads with one pair of FASTQ files (R1 and R2) per sample.

usearch -fastq_mergepairs *_R1*.fastq -relabel @ -fastqout reads.fq

usearch -fastq_filter reads.fq -fastq_maxee 1.0 -fastaout filtered.fa

usearch -fastx_uniques filtered.fa -fastaout uniques.fa -sizeout

usearch -unoise uniques.fa -tabbedout out.txt -fastaout denoised.fa

usearch -usearch_global reads.fq -db denoised.fa -strand plus -id 0.97 -otutabout otu_table.txt