USEARCH
performance home page
Results
See here for
RFAM results..
Methods
The
RFAM benchmark is based on the RFAM
database (Gardner et al., 2011), and was published in (Edgar,
2010).
One thousand sequences were extracted from RFAM at random to use as a
query set, the remaining sequences were used as a search database. The
resulting database is a 41Mb FASTA file containing 191,445 sequences. A
hit is considered to be a true positive if it belongs to the same RFAM
family, and a false positive otherwise, although some families may be
distantly related and error rates may therefore be exaggerated. This
approach was chosen due to the lack of a large nucleotide database
designed for homology detection, and I believe is reasonable for ranking
algorithms, although sensitivity and error rates may not be predictive
of performance on other types of nucleotide sequence. Sensitivity is
measured by considering the top hit or top few
hits.
References
Edgar,
R.C. (2010), Search and clustering orders of magnitude faster than
BLAST, Bioinformatics 26(19) 2460-61,doi: 10.1093/bioinformatics/btq461.
Gardner,
J. Daub, J. Tate, B.L. Moore, I.H. Osuch, S. Griffiths-Jones, R.D. Finn
E.P. Nawrocki, D.L. Kolbe, S.R. Eddy, A. Bateman (2011) Rfam: Wikipedia,
clans and the "decimal" release, NAR doi:
10.1093/nar/gkq1129.
|