The ‑consout option specifies a FASTA output file for consensus sequences. There is one consensus sequence for each cluster. Supported by cluster_fast and cluster_smallmem. The consensus sequence is generated by creating a multiple alignment of the cluster and taking the majority symbol (letter or gap) from each column. If the majority symbol is a gap, the column in skipped. Terminal gaps are ignored unless the cons_truncate option is specified. If ‑cons_truncate is given, and the majority symbol is a terminal gap, then the column is deleted. In other words, with -cons_truncate, all gaps are counted, even if they are terminal gaps. The consensus sequence
may be a better model of a cluster than the centroid sequence. For
example, if you are clustering next-generation reads, then the
longest sequence in the cluster tends to have more errors, and
taking the consensus tends to correct the errors. |