New reporting options in v8.1.1859: -report and -tabbedout. Performs merging of paired reads. (This is sometimes called 'assembly' of paired reads, but I find this term confusing because assembly usually refers to making longer contigs, so I prefer to call it merging). Typical usage: usearch -fastq_mergepairs *_R1_*.fastq -fastqout merged.fq -relabel @ The -fastq_merge_maxee option can be used to set an expected errors threshold, but for OTU clustering quality filtering is generally performed as a post-processing step: usearch -fastq_filter merged.fq -fastq_maxee 1.0 -fastqout filtered.fq This is because filtered reads are used to construct OTUs but unfiltered reads are used to construct the OTU table, so merged reads before and after filtering are both needed. See UPARSE tutorials for some examples. Merged reads are written to -fastqout (for FASTQ) and / or -fastaout (for FASTA). Reads which failed to merge are written to ‑fastqout_notmerged_fwd, -fastqout_notmerged_rev, -fastaout_notmerged_fwd, -fastaout_notmerged_rev. The ‑eetabbedout.output file is a tabbed text file which reports the expected errors for each merged read pair. The -tabbedout option (v8.1.1859) gives much more information about each pair so -eetabbedout is deprecated. The -report filename option gives summary information, click here for an example report. This example shows that there are several anomalously short pairs with merged lengths in the range 20-30, much shorter than the mean (330nt) which suggests using ‑fastq_minmergelen to filter them out. Several forward FASTQ filenames may be given following the -fastq_mergepairs option (v.8.1.1800 and later). This allows you to use shell wildcards to merge several pairs of files in a single step. If you use this feature, you will typically want to use the -relabel @ feature (see below) to label the merged reads with the sample name. The FASTQ filename for the forward reads (R1s) is specified by the -fastq_mergepairs option, and the reverse read filename (R2s) is specified by the ‑reverse option. If the -reverse option is not given, the reverse read filename is constructed by replacing _R1 with _R2 in the forward filename (supported in v.8.1.1800 and later). The -relabel string option specifies that the read labels should be changed in the output files. Labels are made by appending an integer 1, 2, 3... to the string. Only reads that are successfully merged are counted, so there are no gaps in the numbering. The special value @ indicates that the string should be constructed from the file name by truncating the file name at the first underscore or period and appending a period (supported in v.8.1.1800 and later). With a typical Illumina FASTQ file name, this gives the sample name. So, for example, if the R1 file name is Mock_S188_L001_R1_001.fastq, then the string is Mock and the output labels will be Mock.1, Mock.2 etc. The -sample string option specifies that sample=string; should be added to the read label (supported in v.8.1.1800 and later). Forward and reverse reads must be in 1:1 correspondence and must appear in the same order in both files. The labels for the forward and reverse read in a given pair must be identical, or identical except for a single position where a '1' appears in the forward read label and a '2' appears in the reverse read label.
|