Home Software Services About Contact usearch manual
fastq_filter command
 
Performs quality filtering and / or conversion of a FASTQ file to FASTA format.

See also
  Paper describing expected error filtering and paired read merging (Edgar & Flyvbjerg, 2015).
  Paired read assembler and quality filtering benchmark results
 
Read quality filtering
  Expected errors
 
FASTQ format options
  Quality scores

  Choosing FASTQ filter parameters

Output options

Option   Description
‑fastqout filename   FASTQ output file. You can use both ‑fastqout and ‑fastaout.
 
-fastaout filename   FASTA output file. You can use both ‑fastqout and ‑fastaout.
 
-fastqout_discarded filename   FASTQ output file for discarded reads. You can use both ‑fastqout_discarded and ‑fastaout_discarded (v8.0.1616 or later).
 
-fastaout_discarded filename   FASTA output file for discarded reads. You can use both ‑fastqout_discarded and ‑fastaout_discarded (v8.0.1616 or later).
 
-relabel prefix   Generate new labels for the output sequences. They will be labeled prefix1, prefix2 and so on. For example, if you use -prefix SampleA. then the labels will be SampleA.1, SampleA.2 etc.

The special value @ indicates that the string should be constructed from the file name by truncating the file name at the first underscore or period and appending a period (supported in v.8.1.1800 and later). With a typical Illumina FASTQ file name, this gives the sample name. So, for example, if the FASTQ file name is Mock_S188_L001_R1_001.fastq, then the string is Mock and the output labels will be Mock.1, Mock.2 etc.
 

-fastq_eeout   Append the expected number of errors according to the Q scores to the label in the format "ee=xx;". Expected errors are calculated after truncation, if applicable.
 
-sample string   Append sample=string; to the read label.
 

Filtering options

Option   Description
‑fastq_truncqual N   Truncate the read at the first position having quality score <= N, so that all remaining Q scores are >N.
 
-fastq_maxee E   Discard reads with > E total expected errors for all bases in the read after any truncation options have been applied.
 
‑fastq_trunclen L   Truncate sequences at the L'th base. If the sequence is shorter than L, discard.
 
-fastq_minlen L   Discard sequences with < L letters.
 
-fastq_stripleft N   Delete the first N bases in the read.
 
-fastq_maxee_rate E   Discard reads with > E expected errors per base. Requires v8.0.1570 or later. Calculated after any truncation options have been applied. For example, with the fastq_maxee_rate option set to 0.01, then a read of length 100 will be discarded if the expected errors is >1, and a read of length 1,000 will be discarded if the expected errors is >10.
 
-fastq_maxns k   Discard if there are >k Ns in the read.
 

Examples

"Raw" conversion of Sanger FASTQ to FASTA with no filtering:

  usearch -fastq_filter reads.fastq -fastaout reads.fasta -fastq_ascii 64

Truncate to length 150, discard if expected errors > 0.5, and convert to FASTA:

  usearch -fastq_filter reads.fastq -fastq_trunclen 150 -fastq_maxee 0.5 \
    -fastaout reads.fasta

Truncate a read at length 100 and then discard if it contains a Q<15, output to new FASTQ file:

  usearch -fastq_filter reads.fastq -fastq_minlen 100 -fastq_truncqual 15 \
    -fastqout filtered_reads.fastq