See also
fastq_mergepairs command
fastq_mergepairs options
Reviewing a fastq_mergepairs report to check for
problems
Trouble-shooting problems with fastq_mergepairs
If the merge report shows that many reads are failing to
merge for a given reason, then you can use the tabbedout file to investigate
further. For example, suppose the report says that 70% of the pairs were
discarded because of "too many diffs", i.e. mismatches in the alignments.
The simplest way to investigate is to use
the -fastqout_notmerged_fwd
and -fastqout_notmerged_rev options to get the pairs which did not merge,
then (if needed) use fastx_subsample to get a small subset for manual
investigation. See trouble-shooting
merging for details.
If reads are failing to merge for two or
more different reasons, then you can use the tabbedout file to get the
subset of reads that is failing for one of those reasons, which may be
convenient for further analysis in challenging cases.
The format of the tabbedout file is not documented in
detail (and is subject to change in future usearch builds), but is fairly
self-explanatory. Each read pair is one line in the file. The read label is
the first field (truncated at the first space). Subsequent fields are separated by tabs. Each field reports
the results of one step in the merging process, for example:
M00967:15:000000000-A2G1J:1:1101:18083:3926 aln=123-128-121 diffs=15 toomanydiffs result=notmerged
This shows that the pair failed to merge because there were too many (15) mismatches in the alignment. To get the read labels for all the reads that failed to merge for this reason, you can do this:
grep toomanydiffs tabbedout.txt | cut -f1 > toomanydiffs.labels
Then, to get the reads:
usearch -fastx_getseqs myreads_R1.fasta -labels
toomanydiffs.labels -trunclabels -fastqout fwd.fq
usearch -fastx_getseqs
myreads_R2.fasta -labels toomanydiffs.labels -trunclabels -fastqout rev.fq
The -trunclabels option is needed with typical Illumina reads because
otherwise the labels will fail to match due to the suffixes 1:N:0.... and
2:N:0... which are added to the labels for the R1 and R2 reads,
respectively.
Now you have a test set of read pairs which you can use
to investigate further.