USEARCH manual

Illumina unpaired reads

See also
UPARSE home page
UPARSE pipeline home page

This page gives an example UPARSE pipeline for Illumina unpaired reads. These commands make the following assumptions, which are usually but not always true in the datasets I've seen.:

1. There are no non-biological bases in the read such as adapters or barcodes.

2. Sequences complementary to PCR primers are not included in the reads.

3. The reads have been demultiplexed, i.e. split into separate FASTQ files for each sample.

4. The FASTQ filenames start with the sample name.

5. The reads are all on the same strand.

Commands
usearch -fastq_filter *_R1_*.fastq -relabel @ -fastaout reads.fa

usearch -fastq_filter *_R1_*.fastq -fastq_maxee 1.0 -relabel Filt -fastaout filtered.fa

usearch -derep_fulllength filtered.fa -relabel Uniq -sizeout -fastaout uniques.fa

usearch -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu

usearch -usearch_global reads.fa -db otus.fa -strand plus -id 0.97 \
-otutabout otutab.txt -biomout otutab.json