Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed.
An identity threshold must be specified using the ‑id option.
Sequences are processed in the order specified by the -sort option, which may be other (the default), length or size. See UCLUST sort order for discussion. If -sort length is specified, then sequences are processed in order of decreasing length. This is most appropriate when fragments are present together with full-length sequences. If -sort size is specified, then sequences are processed in order of decreasing size annotation. This can be useful for clustering of amplicon reads such as 16S or ITS tags, though cluster_otus is usually recommended for this task. If -sort other is used (the default), then the input sequences are processed in the order they appear in the input file.
Reverse-complemented matching for nucleotide sequences can be specified by using -strand both.
Size annotations may be generated and/or propagated by using the -sizein and/or -sizeout options.
See search flowchart for an overview of searching in USEARCH commands. Searching is used to match input sequences to existing cluster centroids.
Output files
Standard output files are supported. Cluster
centroids (representative sequences) are written to a FASTA file specified by
the -centroids option. Consensus sequences are written to a FASTA file specified
by -consout and multiple alignments are written
to filenames derived from the -msaout option. Note
that using -consout and -msaout may add significantly to the compute time and
memory required for clustering. You can specify a directory to contain one FASTA
file per cluster using the -clusters option.
Supported options
Accept options
Termination
options
Indexing options
Masking options
Multithreading
Alignment parameters
Alignment
heuristics
Example
usearch -cluster_fast query.fasta -id 0.9
-centroids nr.fasta -uc clusters.uc