USEARCH manual

Illumina unpaired reads

See also
UPARSE home page
UPARSE pipeline home page
Illumina unpaired reads of variable-length amplicons

This page gives an example UPARSE pipeline for Illumina unpaired reads. These commands make the following assumptions, which are usually but not always true in the datasets I've seen.:

1. There are no non-biological bases in the read such as adapters or barcodes.

2. Sequences complementary to PCR primers are not included in the reads.

3. The reads have been demultiplexed, i.e. split into separate FASTQ files for each sample.

4. The FASTQ filenames start with the sample name.

5. The reads are all on the same strand.

This example uses the 'for' command of the bash shell to make a loop over all the fastq files in the current directory.

Commands

for fq in *.fastq
do
usearch -fastq_filter $fq -relabel @ -fastaout $fq.labeled.fa
done

for fq in *.fastq
do
usearch -fastq_filter $fq -fastq_maxee 1.0 -relabel Filt -fastaout $fq.filtered.fa
done

cat *.labeled.fa > labeled.fa
cat *.filtered.fa > filtered.fa

usearch -fastx_uniques filtered.fa -relabel Uniq -sizeout -fastaout uniques.fa

usearch -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu

usearch -usearch_global labeled.fa -db otus.fa -strand plus -id 0.97 \
-otutabout otutab.txt -biomout otutab.json