Analysis of CD-HIT and comparisons with USEARCH

 
I am sometimes asked about CD-HIT and how CD-HIT clustering compares with the UCLUST algorithm in USEARCH. Below are links to pages discussing issues around CD-HIT and assessment of clustering methods.

Comparing CD-HIT and USEARCH

Different methods report different pair-wise identities

Example where CD-HIT id is 97% and USEARCH id is 86%

Example where CD-HIT id is 97% and USEARCH id is 95%

CD-HIT alignment errors due to banding

CD-HIT reports systematically higher %ids compared to USEARCH

CD-HIT has low gap penalties and mismatch scores

CD-HIT v4 and USEARCH v5 results on 16S rRNA reads from Costello et al.

Definitions of %id

CD-HIT versions