cluster_fast command

Clusters sequences using a variant of the UCLUST algorithm designed to maximize speed.

Sequences are automatically sorted by decreasing length prior to clustering. If this ordering is not appropriate, then the cluister_smallmem command must be used. See UCLUST sort order.

An identity threshold must be specified using the ‑id option.

The -idprefix option can give significant speed improvements on multi-core CPUs (see accept options). At high identities, sequences will probably share their first few letters, especially in next-gen sequencing applications where the first few bases are primer sequence, so using say -idprefix 2 or -idprefix 4 should not change the results much but can give big speed improvements.

Reverse-complemented matching (-strand both) is not supported. For this, you can use cluster_smallmem (v6.0.289 and later).

Example

usearch -cluster_fast query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc