Illumina reads are often provided as demultiplexed FASTQ files. This occurs when (1) several libraries are sequenced in a single run, (2) the libraries are "barcoded" to identify each sample, and (3) the Illumina machine software (or basespace web site) splits the reads into separate files, one for each barcode.
If you want to run the uc2otutab.py script to generate a OTU table, then you need barcodelabel=sample_name annotations in the read labels. When barcodes are present, these annotations can be added using the fastq_strip_barcode_relabel.py script. If the FASTQ files are already demultiplexed by Illumina then the barcodes are already stripped so fastq_strip_barcode_relabel.py is not applicable.
If you have demultiplexed reads, then you can add a barcodelabel=sample_name annotation using any convenient method. For example, you can use the Linux sed command as shown in the examples below.
Example 1: Unpaired reads
Suppose there are two samples named SamA and SamB. Suppose the FASTQ
filenames from the Illumina software are:
SamA_L001_R1_001.fastq
SamB_L001_R1_001.fastq
We need to convert these to FASTA and add the barcodelabel=SAMPLE_ID; annotation. This can be done using the fastq_filter command to convert to FASTA and discard low-quality reads, then sed to add the barcodelabel= annotation, then the reads can be combined into a single input file using cat.
usearch -fastq_filter SamA_L001_R1_001.fastq -fastaout SamA_filtered.fa -fastq_maxee 1.0
usearch -fastq_filter SamB_L001_R1_001.fastq -fastaout SamB_filtered.fa -fastq_maxee
1.0
sed "-es/^>\(.*\)/>\1;barcodelabel=SamA;/" < SamA_filtered.fa > SamA.fa
sed "-es/^>\(.*\)/>\1;barcodelabel=SamB;/" < SamB_filtered.fa > SamB.fa
cat SamA.fa SamB.fa > reads.fa
Example 2: Paired reads
Suppose there are two samples named SamA and SamB. Suppose the FASTQ
filenames from the Illumina software are:
SamA_L001_R1_001.fastq
SamA_L001_R2_001.fastq
SamB_L001_R1_001.fastq
SamB_L001_R2_001.fastq
We need to convert these to FASTA and add the barcode= annotation. This can be done using the fastq_mergepairs command to combine the read pairs, followed by fastq_filter command to convert to FASTA and discard low-quality reads, then sed to add the barcode= annotation, then the reads can be combined into a single input file using cat.
usearch -fastq_mergepairs SamA_L001_R1_001.fastq -reverse
SamA_L001_R2_001.fastq \
-fastqout SamA_merged.fq
usearch -fastq_mergepairs SamB_L001_R1_001.fastq -reverse SamB_L001_R2_001.fastq
\
-fastqout SamB_merged.fq
usearch -fastq_filter SamA_merged.fastq -fastaout SamA_filtered.fa -fastq_maxee
1.0
usearch -fastq_filter SamB_merged.fastq -fastaout SamB_filtered.fa -fastq_maxee
1.0
sed "-es/^>\(.*\)/>\1;barcodelabel=SamA;/" < SamA_filtered.fa > SamA.fa
sed "-es/^>\(.*\)/>\1;barcodelabel=SamB;/" < SamB_filtered.fa > SamB.fa
cat SamA.fa SamB.fa > reads.fa