CD-HIT and USEARCH report different %ids

<< CD-HIT analysis
<< Comparing USEARCH and CD-HIT
The chart below shows the distribution of %ids measured by USEARCH for 804 pairs of 16S reads that are assigned 97.0% id by CD-HIT. These pairs were obtained by clustering the Costello et al. set at 80% id using CD-HIT-EST v4.5.7 and extracting all pairs with 97.0% identity according to the CD-HIT .clstr output file. Results are binned in USEARCH id intervals of 0.5%. See here for clustering methods. For this test, reads were dereplicated to eliminate identical reads before clustering.

At 97%, CH-HIT clusters many pairs with %ids that are much lower than 97% according to USEARCH and other programs. In this test, 547 / 804 (68%) of pairs had < 97% id according to USEARCH, but only 1 / 804 (0.1%) had > 97% id. This means that clustering results are not comparable at a given %id threshold. See here for further discussion.