USEARCH manual > options > cluster quality and sequence identity |
cluster quality and sequence identity |
Version 6 of USEARCH uses the BLAST definition of sequence identity, while previous versions used the CD-HIT definition by default. For a given alignment, BLAST identity <= CD-HIT identity. This is because BLAST counts gaps as differences, but CD-HIT does not. Insertions and deletions are generally less probable than substitutions. Therefore, gaps should count as least as much as substitutions as a measure of evolutionary distance, and the BLAST definition is more biologically realistic. Increased number of clusters |