USEARCH commands support both local and global alignments. Most
commands have one local and one global variant, e.g. usearch_local
and usearch_global. As an exception, ublast supports only local
alignments. This is because the UBLAST
algorithm is designed primarily to detect distant protein
relationships, which are typically local in character. A global
ublast would be simple to implement; if you see a good reason to do
this, let me know.
See also:
Alignment parameters
Alignment
heuristics
Global alignments A global
alignment contains all letters from both the query and target
sequences. However, it is common in USEARCH applications for the
target sequence to be significantly longer than the query (e.g. query is a short
read, target is a full-length gene), in which
case the alignment will usually have terminal gaps, as in this
example:
Query ---------QVERYSEQ-------
||||||||
Target MVCHLQNGEQVERYSEQANDMQRE
Where possible, database sequences should be trimmed to minimize
terminal gaps.
Local alignments A local
alignment aligns a substring of the query sequence to a substring
of the target sequence. The substrings may be all of one or both
sequences; if all of both are included then the local alignment is
also global. A local alignment is defined by maximizing the
alignment score, so that deleting a
column from either end would reduce the score, and adding further
columns at either end would also reduce the score. For example,
consider this global protein alignment:
Query WSEQVDNCEA
||||+|||
Target KSEQVENCEN
Here, the local alignment would be
obtained by deleting the first and last columns, because WK and AN
have negative substitution scores in the BLOSUM62 matrix.
Local alignments never have terminal gaps,
because a higher score could be obtained by deleting the gaps
(which always have negative scores, i.e. penalties).
Finding local alignments that are
approximately global
The ‑query_cov and ‑target_cov accept
options can be used with local alignments to require that the
alignment covers most of one or both sequences. For example,
‑query_cov 0.9 would require most of the query to be aligned
(semi-global), and -query_cov 0.9 ‑target_cov 0.9 would require
most of both sequences to be aligned.
|