USEARCH manual > commands > cluster_fast |
cluster_fast command |
Clusters sequences using a variant of the UCLUST algorithm designed to maximize speed. Sequences are automatically sorted by decreasing length prior to clustering. If this ordering is not appropriate, then the cluister_smallmem command must be used. See UCLUST sort order. An identity threshold must be specified using the ‑id option. The -idprefix option can give significant speed improvements on multi-core CPUs (see accept options). At high identities, sequences will probably share their first few letters, especially in next-gen sequencing applications where the first few bases are primer sequence, so using say -idprefix 2 or -idprefix 4 should not change the results much but can give big speed improvements. Reverse-complemented matching (-strand both) is not supported. For this, you can use cluster_smallmem (v6.0.289 and later). See also Example usearch -cluster_fast query.fasta -id 0.9
-centroids nr.fasta -uc clusters.uc |