fastq_mergepairs command,
the USEARCH paired-read
assembler
See also
Introduction to paired read
merging
fastq_mergepairs options
Reviewing a fastq_mergepairs report to check for
problems
Using the tabbedout
file to investigate merging problems
Validating merged reads to check for problems
Filtering artifacts
by setting a merge length range
Long overlaps are not needed so 2 x 250 can do better
than V4
Trouble-shooting fastq_mergepairs problems
Staggered read pairs
Quality filtering while merging (not recommended)
Strategies for dealing with low-quality reverse
reads (R2s)
Common cases
2 x 250 reads with long overlap, e.g. 16S V4
2 x 300 reads with short overlap, e.g. 16S
V3-V5
The fastq_mergepairs command merges
(assembles) paired-end reads to create consensus sequences and, optionally,
consensus quality scores. This command has many features and options so I
recommend spending some time browsing the documentation to get familiar with
the capabilities of fastq_mergepairs and issues that arise in read merging.
In the examples below, the forward read FASTQs have "R1" in the filename
and the reverse FASTQs have "R2" as this is the convention currently used by
Illumina.
Basic usage
The simplest
way to use fastq_mergepairs is to specify the the forward and reverse FASTQ
filenames and an output FASTQ filename.
usearch -fastq_mergepairs SampleA_R1.fastq
-reverse SampleA_R2.fastq -fastqout merged.fq
Automatic R2 filename
If the -reverse option is
omitted, the reverse FASTQ filename is constructed by replacing R1 with R2.
The following command line is equivalent to the example above.
usearch -fastq_mergepairs SampleA_R1.fastq -fastqout
merged.fq
Merging multiple FASTQ file pairs in a
single command
You can specify two or more FASTQ filenames
following -fastq_mergepairs. In the following example, SampleA
and SampleB are both merged. The R2 filenames are constructed automatically
as explained above, or can be given explicitly using the -reverse option.
usearch -fastq_mergepairs SampleA_R1.fastq SampleB_R1.fastq -fastqout
merged.fq
Using shell wildcards
to merge multiple FASTQ file pairs in a single command
You can
use shell wildcards (* and ?) to give a pattern that matches the FASTQ files
you want to merge. For example, this will merge all R1 files in the current
directory:
usearch -fastq_mergepairs *R1*.fastq
-fastqout
merged.fq
Adding sample identifiers to read labels
If multiple samples are combined into a single file as shown in
some of the above examples, then you lose track of which read came from
which sample. This is addressed by adding a
sample identifier to each read label. The simplest method is to use the
-sample option, e.g.
usearch -fastq_mergepairs SampleA_R1.fastq -fastqout
merged.fq -sample SampleA
The string sample=SampleA; will be added at
the end of the read label.
Getting the sample
identifier from the FASTQ filename
FASTQ filenames are often
based on the sample identifier, e.g. SampleA_R1.fastq. If you specify
-relabel @ then fastq_mergepairs gets the sample identifier from the FASTQ
file name by truncating at the first underscore (_) or period (.). A period
and the read number is added after the sample identifier to make the new
read label, which replaces the original label. This differs from the -sample
option, which adds the sample= annotation at the end of the label. The
usearch_global command understands both of these methods for putting sample
identifiers into read labels..
usearch -fastq_mergepairs SampleA_R1.fastq -fastqout
merged.fq -relabel @
Merging multiple files with sample
identifiers
By using wildcards and the -relabel @ option
you can merge multiple files and add sample identifiers to the read labels, for example:
usearch -fastq_mergepairs *R1*.fastq -fastqout
merged.fq -relabel @