Protein clustering benchmark (ORTHO) |
USEARCH
performance home page
Bos_taurus.Btau_4.0.58 Equus_caballus.EquCab2.58 Felis_catus.CAT.58 Gorilla_gorilla.gorGor3.58 Homo_sapiens.GRCh37.58 Mus_musculus.NCBIM37.58 Pan_troglodytes.CHIMP2.1.58 Pongo_pygmaeus.PPYG2.58 Rattus_norvegicus.RGSC3.4.58
Vicugna_pacos.vicPac1.58 Labels
were truncated at the first white space in order reduce the file size,
leaving numeric identifiers. This was done because the memory required by CD-HIT scales like the size of the input data + size
of the output data. By contrast, memory required by USEARCH scales like
the size of the output data only, so can be substantially less for large
datasets with high redundancy. The result is a 173Mb FASTA file. Sequences were sorted by
decreasing length prior to clustering by USEARCH. The sort required ~10s time and
~200Mb RAM. References |