<< CD-HIT analysis
<< Comparing USEARCH and CD-HIT
Results
The chart below shows the distribution of %ids measured by USEARCH
for 804 pairs of 16S reads that are assigned 97.0% id by CD-HIT. These
pairs were obtained by clustering the Costello et al. set at 80%
id using CD-HIT-EST v4.5.7 and extracting all pairs with 97.0% identity
according to the CD-HIT .clstr output file. Results are binned in
USEARCH id intervals of 0.5%. See
here for clustering methods. For this test, reads were dereplicated
to eliminate identical reads before clustering. Conclusions
At 97%, CH-HIT clusters many pairs with %ids
that are much lower than 97% according to USEARCH and other
programs. In this test, 547 / 804 (68%) of pairs had < 97% id
according to USEARCH, but only 1 / 804 (0.1%) had > 97% id. This
means that clustering results are not
comparable at a given %id threshold. See here for further
discussion.
|