Search commands require a ‑db option specifying a database
filename. For u-commands (usearch_global, usearch_local and
ublast)
the database may be in FASTA format
or UDB format. Other search commands
(search_local and search_global) support FASTA only. The file
format is automatically detected, so the ‑db option is used for
both file types.
Indexed databases The
u-commands are designed to optimize search speed for large
datasets. A key technique is using an index on the database that
supports rapid retrieval of word counts or seeds. The index can be
built in memory on the fly from a FASTA file, or can be pre-built
and stored in a UDB file. Using FASTA can be convenient, but
with large database load times are longer and more memory is required
compared to using a UDB file. The memory required to store a UDB
file in memory is approximately the same as the UDB file size. When
indexes are created on the fly from a FASTA file, indexing options can be specified on
the search command line. This also applies when a centroid database is
constructed on the fly during clustering.
Non-indexed databases The
search_local and search_global commands do not use an index. They
use a FASTA database file which is loaded into memory without
creating an index. The memory required is approximately the same as
the FASTA file size. Using these commands saves memory and can be
convenient for small datasets, but searches are usually
slower.
|