Applications
Non-redundant databases |
A
"non-redundant" (NR) database contains only one representative of a given type
of sequence. Dereplication removes
identical sequences. Clustering at a lower threshold, e.g. 90%, may reduce the
database size, enabling faster searches with only a small loss in sensitivity.
See also database optimization. |
OTU construction |
In marker gene metagenomics, reads of genes such as
small-subunit RNA (16S, 18S and ITS) and cytochrome oxidase I (COI) are often
clustered into groups called Operational Taxonomic Units (OTUs), typically at a
97% identity threshold. The UPARSE pipeline
achieves the best throughput and highest published biological accuracy at the
time of writing (Nature
Methods, Aug 2013). The UCHIME algorithm
can be used for stand-alone chimera filtering in an OTU pipeline. UCHIME is
implemented in the uchime_ref and
uchime_denovo commands. |
Amplicon diversity | Clustering of
amplicon reads, e.g. from 16S marker genes, antibody or T-cell receptor (TCR)
immune system repertoire sequencing, can be used to estimate the biological
diversity represented in the amplicons. |
Algorithms
UCLUST |
UCLUST is a general-purpose clustering
algorithm which achieves significantly higher speed and sensitivity compared
with CD-HIT and other alternative algorithms (see
benchmarks). The UCLUST algorithm is
implemented in the cluster_fast and
cluster_smallmem commands. |
UPARSE |
UPARSE is an algorithm for constructing
OTUs from amplicon reads. A full implementation of UPARSE requires a
pipeline which takes FASTQ reads and
generates clusters. The cluster_otus
command performs the clustering step after quality filtering and length trimming
of the reads. |
Dereplication |
Dereplication reports one copy of every
unique sequence in the input data. This is a special case of clustering at 100%
identity, which can be implemented more efficiently using specialized
algorithms. USEARCH supports both full-length and prefix dereplication, which
are implemented in the derep_prefix and
derep_fulllength commands. |