Applications
Non-redundant databases |
A "non-redundant" (NR) database contains only one representative of a given type of sequence. Dereplication removes identical sequences. |
Reduced redundancy databases | Clustering at a lower threshold, e.g. 90%, may reduce the database size, enabling faster searches with only a small loss in sensitivity. |
Algorithms
UCLUST |
UCLUST is a general-purpose clustering algorithm which achieves significantly higher speed and sensitivity compared with CD-HIT and other alternative algorithms (see benchmarks ). The UCLUST algorithm is implemented in the cluster_fast and cluster_smallmem commands. |
Dereplication |
Dereplication reports one copy of every unique sequence in the input data. USEARCH supports both full-length and prefix dereplication, which are implemented in the derep_prefix and derep_fulllength commands. |