See also
Quality control for OTU sequences
filter_phix
Filter low-complexity sequences.
Input is a FASTA or FASTQ file. A reverse (R2) read file can be specified using -reverse, in which case a read pair is discarded if the forward (R1) and/or reverse (R2) read is found to be low-complexity. If -reverse is specified, -output2 must be specified for the reverse read output file.
Output is written in the same format (FASTA or FASTQ) as the input.
A sequence is discarded if it is >25% low-complexity. This threshold can be changed with the max_lowc_pct option.
The -hitsout file contains the low-complexity sequences that were removed.
The -tabbedout file species an output file in tabbed text format which reports the percentage of low complexity sequence found in each of the query sequences.
Multithreading is supported.
Examples
usearch -filter_lowc reads.fq -output filtered_reads.fq -tabbedout lowc.txt -hitsout lowc.fq
usearch -filter_lowc reads_fwd.fq -reverse reads_rev.fq -output fil_fwd.fq -output2 fil_rev.fq