Performs quality filtering and / or conversion of a
FASTQ file to FASTA format.
See also
Paper describing expected error
filtering and paired read merging (Edgar & Flyvbjerg, 2015).
Paired read assembler and
quality filtering benchmark results
Read quality filtering
Expected errors
FASTQ format options
Quality scores
Choosing FASTQ
filter parameters Strategies
for dealing with low-quality reverse reads (R2s)
The fastx_learn command is useful for checking
the error rate after expected error quality filtering,
which assumes that the Q scores are accurate. It does not use Q scores so gives
an independent check.
Output options
Option |
|
Description |
‑fastqout filename |
|
FASTQ
output file. You can use both ‑fastqout and ‑fastaout.
|
-fastaout filename |
|
FASTA
output file. You can use both ‑fastqout and ‑fastaout.
|
-fastqout_discarded filename |
|
FASTQ
output file for discarded reads. You can use both ‑fastqout_discarded and ‑fastaout_discarded.
|
-fastaout_discarded filename |
|
FASTA
output file for discarded reads. You can use both ‑fastqout_discarded and ‑fastaout_discarded.
|
-relabel prefix |
|
Generate
new labels for the output sequences. They will be labeled prefix1,
prefix2 and so on. For example, if you use -relabel SampleA. then the labels
will be SampleA.1, SampleA.2 etc. The special
value @ indicates that the string should be constructed from the file name by
truncating the file name at the first underscore or period and appending a
period. With a typical Illumina FASTQ file
name, this gives the sample name. So, for example, if the FASTQ file name is
Mock_S188_L001_R1_001.fastq, then the string is Mock and the output labels will
be Mock.1, Mock.2 etc.
|
-fastq_eeout |
|
Append the
expected number of errors according to the Q
scores to the label in the format "ee=xx;". Expected errors are calculated after
truncation, if applicable.
|
-sample string |
|
Append
sample=string; to the read label.
|
Filtering options
Option |
|
Description |
‑fastq_truncqual N |
|
Truncate
the read at the first position having quality score
<= N, so that all remaining Q scores are >N.
|
-fastq_maxee E |
|
Discard
reads with > E total expected errors for all
bases in the read after any truncation options have been applied.
|
‑fastq_trunclen L |
|
Truncate sequences at the L'th base. If the sequence is shorter than L, discard.
|
-fastq_minlen L |
|
Discard
sequences with < L letters.
|
-fastq_stripleft N |
|
Delete the first
N bases in the read.
|
-fastq_maxee_rate E |
|
Discard
reads with > E expected errors per base.Calculated after any truncation options have been applied.
For example, with the fastq_maxee_rate option set to 0.01, then a read of length
100 will be discarded if the expected errors is >1, and a read of length 1,000
will be discarded if the expected errors is >10.
|
-fastq_maxns k |
|
Discard if there
are >k Ns in the read.
|
Examples "Raw"
conversion of Sanger FASTQ to FASTA with no filtering:
usearch -fastq_filter reads.fastq -fastaout
reads.fasta -fastq_ascii 64 Truncate to length
150, discard if expected errors > 0.5, and convert to FASTA:
usearch -fastq_filter reads.fastq -fastq_trunclen
150 -fastq_maxee 0.5 \
-fastaout reads.fasta Truncate
a read at length 100 and then discard if it contains a Q<15, output to new FASTQ
file: usearch -fastq_filter reads.fastq -fastq_minlen
100 -fastq_truncqual 15 \
-fastqout filtered_reads.fastq
|