USEARCH manual

USEARCH manual > UCLUST algorithm > UCLUST sort order

UCLUST sort order

See also
UCLUST algorithm
Recentering
Abundance sort

Sort order
UCLUST assumes that input sequences are sorted in an order such that an appropriate centroid sequence is found before other members of its cluster. The cluster_fast command automatically sorts by decreasing length. This cannot be changed. The cluster_smallmem command does not perform a sort, so it is the user's responsibility to sort the input before clustering, e.g. by using the sortbylength or sortbysize commands. These implement the two most common sort orders, summarized in the table below.

Order	Command	Description
Decreasing length	sortbylength	This order is most appropriate when input sequences have large variations in length, e.g. because full-length sequences and fragments are both present, as shown in the figure below. However, with a length sort, the longest sequence may be an outlier. This can be addressed by recentering.
Decreasing abundance	sortbysize	See abundance sorting.

Multiple alignment of a cluster.
The centroid (representative) sequence is shown in red.
Fragments are poor centroids because member sequences may be
dissimilar in the regions that do not align to the fragment (orange).