|
UCLUST v2.1
A new version of UCLUST has been posted.
This program implements the USEARCH, UCLUST, UHIRE and UCHIME algorithms
for database search, clustering and chimeric sequence detection
respectively. See here for the
change log with a summary of new features, bug reports and fixes. Hierarchical
clustering
The new --uhire command performs hierarchical
clustering in a single step.
Clumping
"Clumping" is my name
for clustering with the goal of identifying clusters (clumps) of
pre-determined size. Members of a given clump should be more similar to
each other than to members of other clumps. If you know of an existing
term for this type of clustering, please let me know -- the idea is
simple, but I haven't seen it before. The motivation for clumping is to
divide a set of sequences into pieces that are small enough for a given
method to handle -- say, multiple alignment or phylogenetic tree
estimation.
Chimera detection
The new UCHIME algorithm searches
for chimeric sequences. It is still a work in progress, but I believe
it will be useful in some applications and I am therefore making the
prototype available. Unlike any other chimera detection method I know
of, it can search for chimeras de novo in large sequences sets without a
reference database and without constructing a multiple alignment.
Huge multiple
alignments
One application of clumping is
the rapid construction of high-quality large multiple alignments with up
to a million or so sequences. This can be done in minutes, or at most a
few hours, typically in < 1Gb RAM. The idea is to split the sequences
into clumps that are small enough for MUSCLE (or some other program) to
align. A representative sequence is extracted from each clump; these are
aligned to create a 'master' alignment. The master alignment is then
used to guide the alignments of clumps to each other, keeping the
internal alignment of a clump intact. This is implemented using the new
--mergeclumps command, which gives much better quality alignments than
--staralign, which is designed for very high speed at the expense of
alignment accuracy. Upgrading
to v2.1
Commercial users should already
have received an upgrade. Non-commercial users can upgrade by using the download
page.
|