Ensemble FASTA (EFA) format
The EFA file format stores one or more multiple sequence alignments (MSAs) in a single text file.
This is convenient for processing ensembles, which typically have 16 or 100 MSAs.
Each alignment has a header line with a less-than symbol (<)
followed by a label (e.g., <abc.2). This header line is
followed by the MSA in aligned FASTA format. The end of the alignment is indicated by
the next header line, or the end of the file. Blank lines are allowed, but the first
character in the file must be <.
You can convert between multiple FASTA files and one EFA file using the
fa2efa and efa_explode
commands.
Most commands which require an ensemble filename as input accept either EFA or
a text file with a list of FASTA filenames or pathnames, one per line.
Below is a simple example with two MSAs of three sequences.
Example
<none.0
>SequenceA
GATTACA
>SequenceB
GAT-ACA
>SequenceC
GATTAC-
<abc.1
>SequenceA
GATTACA
>SequenceB
GA-TACA
>SequenceC
GATTAC-