MUSCLE manual |
Diagonal optimization |
Creating a pair-wise alignment by dynamic programming requires computing an N x M matrix, where N and M are the sequence lengths. A trick used in algorithms such as BLAST is to reduce the size of this matrix by using fast methods to find "diagonals", also called "gapless high-scoring segment pairs", i.e. short regions of high similarity between the two sequences. This speeds up the algorithm at the expense of some reduction in accuracy. MUSCLE uses a technique called k-mer extension to find diagonals, which is described in this paper: Edgar, R.C. (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets, Nucleic Acids Res., 32, 1. [Pubmed]. It is disabled by default because of the slight reduction in average accuracy, and can be turned on by specifying the –diags option. To enable diagonal optimization only in the first iteration, use –diags1, to enable diagonal optimization in the second iteration only use –diags2. These are provided separately because it would be a reasonable strategy to enable diagonals in the first iteration but not the second (because the main goal of the first iteration is to construct a multiple alignment quickly in order to improve the distance matrix, which is not very sensitive to alignment quality; whereas the goal of the second iteration is to make the best possible progressive alignment). |