Home About Contact
Reseek
 
 
 
 
  


Reseek resource use

Case study: building and searching PDB-95 database
Machine was 32-core Linux server (i9-14900K CPU).

Full PDB raw download in cif.gz format is 71Gb (Oct 2024).

Reseek converts this full PDB download to .bca, .cal and FASTA in 7 minutes using 13 Gb RAM

File sizes for full PDB (957.7k chains): .bca 1.7 Gb (~40x compression), .cal 4.2 Gb (~17x compression), FASTA 245 Mb.

Identifying identical amino acid sequences takes 3 secs., 1.1 Gb RAM using usearch fastx_uniques command (387,554 chains, 121 Mb FASTA).

Clustering at 95% identity takes 25 secs., 1.3 Gb RAM using usearch cluster_fast command (151,994 chains, 45 Mb FASTA).

Extracting the PDB-95 subset at 95% identity (151,994 chains) using getchains takes 21 secs., 1.5 Gb RAM. Final output file sizes are 782 Mb for .cal and 312 Mb for .bca.

For all-vs-all search, loading PDB-95 in bca format takes 47 secs.

With 32 threads, peak memory used by all-vs-all search is 5.0 Gb with typical use around 3 Gb, elapsed time including database load is 25 minutes.