See also
fastq_mergepairs command
fastq_mergepairs options
fastq_filter command
Sometimes,
reverse reads have substantially lower quality than forward reads, especially
when the read length is longer. This can cause a low rate of merged reads and /
or a large number of reads to be discarded by quality filtering. Following are
some strategies that can help.
Truncate all R2s to a shorter fixed length
Typically
quality drops towards the end of the read. You can check for this using the
fastq_eestats2 command. You can then
consider truncating all R2s before merging by using the
fastx_truncate command.
Truncate R2s using a minimum Q score
Usually, quality
filtering that gives variable length reads is not recommended (see
global trimming for discussion). However,
if the reads will be merged then truncating the forward or reverse reads to
discard low-quality bases can be effective because the merged sequence will
be globally trimmed regardless of end trimming.
The fastq_trunctail option, default value 2, truncates both the forward and reverse read before the first base with the given quality score. Q=2 means a probability of 63% that the base call is wrong, so this is a pretty conservative threshold. You might consider increasing it to 3, 4 or 5. Note that this will truncate both the R1 and R2.
There are two alternative ways to do this which allows setting different Q score thresholds for the R1s and R2s: the fastq_filter command with the fastq_truncqual option or the fastq_mergepairs command with the fastq_minqual option. Note that the definitions of these options are different:fastq_truncqual truncates at the first Q score which is less than or equal to the given value, while fastq_minqual truncates at the first Q score which is less than the given value.
Using fastq_filter has the advantage that you can analyze the FASTQ output using commands such as fastq_eestats2, this is useful for choosing the minimum Q score -- try different values and review the results. If you filter forward and reverse reads separately then you will get different sets of reads. Or, you can use fastq_filter to tune the value of fastq_truncqual (Q) then set the fastq_minqual option of fastq_mergepairs to Q+1.
I suggest trying minimum Q score values between 2 and 5.
Discard reverse reads and make OTUs from forward reads only
Sometimes, the reverse reads are so bad that it is better to discard them
and make OTUs from the forward reads only. This can be a difficult decision
because it is hard to throw expensive data away. One approach is to make
several sets of OTUs using different strategies (e.g., forward only, merged)
and compare the results. The trade-off is between better phylogenetic
resolution with more bases versus reduced sensitivity to low-abundance
species with fewer reads. Sometimes, it is reasonable to use different sets
of OTUs for different types of analysis.