cluster_smallmem command

See also
cluster_fast
cluster_otus
cluster_agg
cluster_aggd

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to minimize memory use.

It's is the user's responsibility to sort the input sequences in an appropriate order before running cluster_smallmem; see UCLUST sort order for discussion. By default, input sequences are expected to be sorted by decreasing length. If some other sort order is used, the -sortedby option should be specified. Valid values are length (default), size and other. If -sortedby other is specified, then USEARCH does not assume or check for any particular order. See also sortbysize and sortbylength.

An identity threshold must be specified using the -id option.

Multithreading is not supported as this would require significant memory overhead.

By default, nucleotide matching is done on the forward strand only. For matching on both strands, use -strand both.

Example

usearch -cluster_smallmem query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc