See also
nastout files
NAST (Nearest Alignment Space Termination) is a multiple alignment format originally designed for 16S rRNA, though the approach can readily be adapted to other genes and regions.
NAST was introduced in a paper by DeSantis et al.,
NAST: a
multiple sequence alignment server for comparative analysis of 16S rRNA genes,
The main idea of NAST is to create a reference multiple alignment with a fixed number of columns that does not change as new sequences are introduced. Columns in a NAST alignment serve as fixed reference points for a set of homologous sequences, e.g. 16S genes. Similar ideas have been applied to other genes, e.g. IMGT unique numbering for immunoglobulins (PMID 12477501). Lacking a better name, I generically refer to this approach as "NAST".
A new sequence can be aligned to a reference alignment relatively easily, by identifying the closest sequence or closest few sequences. A pair-wise alignment or small multiple alignment is then made, which can readily be mapped back to the full reference alignment. This approach allows new sequences to be annotated with features, e.g. hypervariable regions, using a pre-defined map of features to column numbers. Given the very large datasets now available for 16S, immunoglobulins and other genes and regions, some traditional methods are computationally intractable, while a NAST alignment enables efficient calculation of pair-wise distances, identification of chimeric sequences, etc.
There are two main disadvantages of NAST-like approaches. First, novel insertions cannot be accommodated correctly because the format has a fixed number of columns by definition. Therefore, novel insertions must be deleted (this is the solution adopted in USEARCH), or misalignments must be introduced. Neither solution is entirely satisfactory, for obvious reasons. Second, some (if not most) genes and regions are simply too variable, making it impossible to build a reasonable multiple alignment, e.g. the fungal Internal Transcribed Spacer (ITS) region.
I believe that better results can
usually be obtained by constructing pair-wise or multiple alignments of subsets
de novo, as done for example by the
uchime_ref command (compare
ChimeraSlayer and the
mothur Chimera.slayer
command which use a NAST-based method but are slower and less accurate).
However, NAST methods can be convenient in some situations, especially where
there is existing data and annotations that rely on 16S NAST or a NAST-like
fixed column scheme such as IMGT numbering.