Home Software Services About Contact     
 
Muscle5

Guide tree permutations

The goal of permuting the guide tree is to induce substantive variation into any systematic errors due to progressive alignment, without compromising accuracy. Optimising accuracy requires that closely-related sequences are aligned before more diverged sequences are added, which in turn requires that the guide tree joining order should be preserved close to its leaves. Substantive variations require that larger groups are joined in different orders.

These constraints imply that changes should be made to the joining order of larger groups close to the root, but this can be tricky to achieve in practice as guide trees are often highly unbalanced, i.e. many nodes join small groups to large groups, in which case naive re-arrangements of the tree may fail to induce substantive variations.

Muscle5 manipulates the guide tree T as follows. An edge is identified which divides the leaves of T into subsets a and bc such that the ratio |a|/|bc|≈1/2, i.e. a has approximately one third of the leaves in T. The tree bc is then divided into subsets b and c of equal size so that |b|/|c|≈1. Regardless of the original guide tree topology, when there are many leaves this procedure successfully divides T into three subtrees a, b and c of approximately equal size where the joining order close to the leaves is mostly preserved. Progressive alignment is performed using the original guide tree and permutations ((a,b),c), ((a,c),b) and ((b,c),a), abbreviated to none, abc, acb and bca respectively.

For more details, see the preprint supplementary material here: https://drive5.com/muscle5/Muscle5_SuppMat.pdf.