Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to minimize memory use.
It's is the user's responsibility to sort the input sequences in an appropriate order before running cluster_smallmem; see UCLUST sort order for discussion. By default, input sequences are expected to be sorted by decreasing length. If some other sort order is used, the -sortedby option should be specified. Valid values are length (default), size and other. If -sortedby other is specified, then USEARCH does not assume or check for any particular order. See also sortbysize and sortbylength.
An identity threshold must be specified using the -id option.
Multithreading is not supported as this would require significant memory overhead.
By default, nucleotide matching is done on the forward strand only. For matching on both strands, use -strand both.
See also
Standard output file options
Accept options
Indexing options
Termination options
Masking options
Alignment parameters
Alignment heuristics
Cluster sizes
Memory requirements
Example
usearch -cluster_smallmem query.fasta -id 0.9
-centroids nr.fasta -uc clusters.uc