Performs merging of paired reads. (This is sometimes called 'assembly' of paired reads, but I find this term confusing because assembly usually refers to making longer contigs, so I prefer to call it merging). Typical usage: usearch -fastq_mergepairs *_R1_*.fastq -fastqout merged.fq -relabel @ The -fastq_merge_maxee option can be used to set an expected errors threshold, but for OTU clustering quality filtering is generally performed as a post-processing step: usearch -fastq_filter merged.fq -fastq_maxee 1.0 -fastqout filtered.fq This is because filtered reads are used to construct OTUs but unfiltered reads are used to construct the OTU table, so merged reads before and after filtering are both needed. See UPARSE tutorials for some examples. Merged reads are written to -fastqout (for FASTQ) and / or -fastaout (for FASTA). Reads which failed to merge are written to ‑fastqout_notmerged_fwd, -fastqout_notmerged_rev, -fastaout_notmerged_fwd, -fastaout_notmerged_rev. The -tabbedout option reports one line per read pair giving information about how it was processed, e.g. if an alignment was found and the expected errors value for the pair. The -report filename option gives summary information, click here for an example report. This example shows that there are several anomalously short pairs with merged lengths in the range 20-30, much shorter than the mean (330nt) which suggests using ‑fastq_minmergelen to filter them out. Several forward FASTQ filenames may be given following the -fastq_mergepairs option. This allows you to use shell wildcards to merge several pairs of files in a single step. If you use this feature, you will typically want to use the -relabel @ feature (see below) to label the merged reads with the sample name. The FASTQ filename for the forward reads (R1s) is specified by the -fastq_mergepairs option, and the reverse read filename (R2s) is specified by the ‑reverse option. If the -reverse option is not given, the reverse read filename is constructed by replacing _R1 with _R2 in the forward filename. The -relabel string option specifies that the read labels should be changed in the output files. Labels are made by appending an integer 1, 2, 3... to the string. Only reads that are successfully merged are counted, so there are no gaps in the numbering. The special value @ indicates that the string should be constructed from the file name by truncating the file name at the first underscore or period and appending a period. With a typical Illumina FASTQ file name, this gives the sample name. So, for example, if the R1 file name is Mock_S188_L001_R1_001.fastq, then the string is Mock and the output labels will be Mock.1, Mock.2 etc. The -sample string option specifies that sample=string; should be added to the read label (supported in v.8.1.1800 and later). Forward and reverse reads must be in 1:1 correspondence and
must appear in the same order in both files. The labels for the forward and
reverse read in a given pair must be identical, or identical except for a single position
where a '1' appears in the forward read label and a '2' appears in the reverse
read label.
|