Home Software Services About Contact     
 
Muscle5

Cluster ensemble

1. Input sequences are clustered, and one representative sequence is selected at random from each cluster.

2. This process is repeated, creating an ensemble of unaligned input sets.

3. One MSA is created for each input set from step 2.

4. One tree is generated for each MSA from step 3.

This generates an ensemble of trees. If sequences and tree leaves are labeled with cluster identifiers so that two sequences from the same cluster have the same label, then correct trees should be identical if clusters are monophyletic in the true tree, or very similar otherwise.

This technique can be very useful when there are too many sequences for maximum likelihood tree estimation to be tractable.

Even if tree estimation is practical, this technique provides another, complementary method for assessing the robustness of tree estimation from the original sequence.

The split command of the newick utility can be used to create clusters from a tree.