Home Software Services About Contact


Non-redundant databases
A "non-redundant" (NR) database contains only one representative of a given type of sequence. Dereplication removes identical sequences.
Reduced redundancy databases Clustering at a lower threshold, e.g. 90%, may reduce the database size, enabling faster searches with only a small loss in sensitivity.


UCLUST is a general-purpose clustering algorithm which achieves significantly higher speed and sensitivity compared with CD-HIT and other alternative algorithms (see benchmarks). The UCLUST algorithm is implemented in the cluster_fast and cluster_smallmem commands.
Dereplication Dereplication reports one copy of every unique sequence in the input data. USEARCH supports both full-length and prefix dereplication, which are implemented in the derep_prefix and derep_fulllength commands.