USEARCH manual

Benchmarking USEARCH against BLAST

Benchmarking the USEARCH algorithm against other search methods is challenging because there are fundamental differences in design.

Note that the USEARCH binary supports several other algorithms such as UCLUST and UBLAST. This causes an unfortunate confusion in terminology -- it was not a good decision to call the package "USEARCH".

Top hit(s) vs. all hits
The USEARCH algorithm (usearch_global and usearch_local commands) is designed to find the top hit, or a few top hits, while most other search algorithms are designed to find all hits that satisfy threshold criteria such as identity or E-value. I would argue that USEARCH reflects what biologists usually want, because in practice, only the few best hits from traditional search algorithms like BLAST are typically retained for downstream analysis.

Global vs. local hits
The most popular search command, usearch_global, uses global alignments, while most search algorithms such as BLAST use local alignments. Local alignments can also be used (see usearch_local command), but these are rarely used in practice. Whether global or local alignments are more appropriate depends on the context. For example, with single-gene databases, such as SSU rRNA, or orthologs, global alignment is often better, and in these cases the global hits generated by USEARCH often give better estimates of sequence identity and more accurate assignments of taxonomy.

These essential differences mean that rigorous comparison of the USEARCH algorithm and BLAST is not really possible, and benchmarks should therefore not be taken too seriously. In these pages, I have done my best to design tests that give a realistic indication of the relative performances of typical search tasks.