Illumina unpaired reads
See also
UPARSE home page
UPARSE pipeline
home page
Illumina
unpaired reads of variable-length amplicons
This page gives
an example UPARSE pipeline for Illumina unpaired reads. These commands make the
following assumptions, which are usually but not always true in the datasets
I've seen.:
1. There are no non-biological bases in the read such as adapters or barcodes.
2. Sequences complementary to PCR primers are not included in the reads.
3. The reads have been demultiplexed, i.e. split into separate FASTQ files for each sample.
4. The FASTQ filenames start with the sample name.
5. The reads are all on the same strand.
This example uses the 'for' command of the bash shell to make a loop over all the fastq files in the current directory.
Commands
for fq in *.fastq
do
usearch -fastq_filter $fq -relabel @ -fastaout
$fq.labeled.fa
done
for fq in *.fastq
do
usearch -fastq_filter
$fq -fastq_maxee 1.0 -relabel Filt -fastaout $fq.filtered.fa
done
cat *.labeled.fa > labeled.fa
cat *.filtered.fa > filtered.fa
usearch -fastx_uniques filtered.fa -relabel Uniq -sizeout -fastaout uniques.fa
usearch -cluster_otus uniques.fa -minsize 2 -otus otus.fa -relabel Otu
usearch -usearch_global labeled.fa -db otus.fa -strand plus -id 0.97 \
-otutabout otutab.txt -biomout otutab.json