Home Software Services About Contact
Quality filtering
FASTQ reads can be filtered to discard reads with lower quality as predicted by the Phred scores. USEARCH provides a maximum expected error filter which uses a better measure of base call accuracy compared with average Q or minimum Q score filters. Quality filtering is implemented in the fastq_filter command, which offers a rich set of parameters.
FASTQ to FASTA conversion The fastq_filter command can generate output in FASTQ and/or FASTA format. If no quality filtering parameters are specified, it performs a "raw" conversion of FASTQ to FASTA.
Paired read overlapping
Paired reads that overlap can be "merged" or "assembled" by aligning the forward and reverse reads to give a single FASTQ or FASTA record for each pair. This is implemented in the fastq_mergepairs command. Phred (quality, Q) scores for the merged pair are calculated using Bayesian statistics and are reported in the merged FASTQ record. If the forward and reverse reads agree on a base call, the Q score is increased; if they disagree, the Q score is reduced.
Dereplication Dereplication removes identical sequences, leaving one copy of each unique sequence. With very large read depths, this can significantly reduce the data size and cost of downstream processing, especially with amplicon reads. Dereplication is implemented in the derep_prefix and derep_fulllength commands.
Chimeric sequence filtering
Amplicon reads contain chimeric sequences due to PCR artifacts. The UCHIME algorithm is a high-throughput, high-accuracy chimera filter. UCHIME is implemented in the uchime_ref and uchime_denovo commands. For OTU clustering, the cluster_otus command includes a highly sensitive chimera filter based on the UPARSE-REF algorithm, which has better sensitivity than UCHIME.
FASTQ file statistics The fastq_stats command generates statistics on read quality and length. The fastq_chars command generates statistics on the ASCII characters used to represent Q scores, which can be helpful when trying to determine the format of a FASTQ file.