Where possible, sequences in database files should be trimmed to minimize terminal gaps. This reduces the memory required to store the database, improves the correlation between word count and sequence similarity for the USEARCH algorithm, and increases search speed by reducing the number of spurious word matches that must be counted or extended. For example, In next-generation 16S
sequencing, it is common to sequence a region of the 16S gene
between a pair of primers. In this case, it is recommended to trim
the database to the sequencing primers. |