Muscle supports several alignment formats, summarized in the following table.
The output format can be specified in two different ways, you can choose either
(but not both). For example, the -clw option
specifies that the primary output should be written in CLUSTALW format, and the
-clwout option gives the name of an output file that will be written in CLUSTALW
format. The primary output is specified using the -out option, and defaults to
standard output.
Multiple output formats in one run
You can specify several -xxxout option on the same command line, which
allows you to create multiple output formats in a single run, e.g.:
muscle -in seqs.fa -fastaout seqs.afa -clwout
seqs.aln
Flag |
Filename option |
Description |
-clw |
-clwout filename |
CLUSTALW format. By default, will write MUSCLE as the program
name in the file header. If the -clwstrict option is specified, then
the program name will be written as "CLUSTAL W (1.81)". This is
useful if the output will be parsed by scripts that check the
program name.
|
-fasta |
-fastaout filename |
FASTA format
(default).
|
‑html |
-htmlout filename |
HTML (web page) output. The alignment is
colored using a color scheme from
Eric Sonnhammer's
Belvu editor.
|
‑phys |
-physout filename |
PHYLIP sequential format.
|
‑phyi |
-phyiout filename |
PHYLIP interleaved format.
|
‑msf |
-msfout filename |
MSF format, as used in the GCG package, is
requested by using the –msf option. As with CLUSTALW format, this is
easier for people to read than FASTA. As of MUSCLE 3.52, the MSF
format has been tweaked to be more compatible with GCG. The
following differences remain.
(a) MUSCLE truncates labels at the first white space or after 63
characters, which ever comes first. The GCG package apparently
truncates after 10 characters. If this is a problem for you, please
let me know and I'll add an option to truncate after 10 in a future
version.
(b) MUSCLE allows duplicate sequence labels, while GCG forbids
duplicates. If you use the –stable option of muscle, then the order
of the input sequences is preserved and sequences can be
unambiguously identified even if the labels differ.
Thanks to Eric Martel for help with improving GCG compatibility.
|
|