Fast, accurate chimera detection 

 
About UCHIME
UCHIME is an algorithm for detecting chimeric sequences. It was developed in collaboration with Brian Haas, Jose Carlos Clemente, Chris Quince and Rob Knight. Chimeras are commonly created during DNA sample amplification by PCR, especially in community sequencing experiments using single regions such as the 16S rRNA gene in bacteria or the fungal ITS region.
UCHIME can detect chimeras using a reference database or de novo using abundance information on the assumption that chimeras are less abundant than their parents because they must have undergone fewer rounds of amplification.

OTU clustering
UCHIME is most often used in the context of OTU clustering for community sequencing experiments based on rRNA genes such as 16S, 18S and ITS. I recommend the otupipe script for OTU clustering. This script uses several algorithms, including UCLUST and UCHIME, to generate OTUs from next-generation reads.

Sensitivity and error rates
On our tests, UCHIME is more sensitive than ChimeraSlayer, the best previous method using a reference database, especially with short, noisy sequences and when database sequences are diverged from a chimera's true parent sequences. The de novo mode of UCHIME has comparable sensitivity to Perseus. UCHIME has lower average error rate than ChimeraSlayer. The error rate is harder to measure for de novo mode, but appears to be comparable to Perseus.
 
Speed
There are two implementations of UCHIME. One is open-source (strictly, public domain). This version is >1000x faster than ChimeraSlayer and >100x faster than Perseus. A faster version of the same algorithm is implemented in the USEARCH package v4.1 and later. This is at least an order of magnitude faster than the open-source version, and can be even faster with large datasets.
 
Paper
Edgar,RC, Haas,BJ, Clemente,JC, Quince,C, Knight,R (2011) UCHIME improves sensitivity and speed of chimera detection, Bioinformatics doi: 10.1093/bioinformatics/btr381 [PMID 21700674].

Downloads and documentation
Test data, precompiled binaries and source code for the public domain version can be downloaded here.
 
The USEARCH package, which includes the faster implementation of UCHIME, is available at no charge for academic use. For more information about licensing, please visit the USEARCH home page.
 

 
News
I'm looking for new projects -- collaboration and consulting.
 
USEARCH 4.0 released.
 
USEARCH 3.0 released.

UCLUST v2.1 supports very large MUSCLE alignments in < 1 Gb memory.

USEARCH and UCLUST released. Search and clustering hundreds of times faster than BLAST.

MUSCLE v3.8 released.
 

Blog
Send me your big sets!
Multiple protein alignment is dead
Big alignments -- do they make sense?
An unemployed gentleman

Fishing for significance