See also
UPARSE home page
OTU benchmark results
OTU benchmark methods
uparse_ref command
Category | Description |
Perfect |
Identical to a biological sequence. |
Good |
>= 99% identical to a biological sequence. |
Noisy |
>= 97% identical to a biological sequence. |
Chimeric |
None of the above, and <97% identical to a biological
sequence. Some Good and Noisy sequences may also be chimeras, but these are less
likely to disrupt downstream analysis and are not classified as Chimeric. |
Contaminant |
MEGABLAST search of the NCBI nt database16
reports a hit with >=95% identity covering >=95% of the OTU sequence. |
Other |
None of the above. Could be a novel biological sequence,
or more likely is a sequence with >3% errors. |
OTU sequence accuracy was assessed by making pair-wise global alignments with all reference sequences and selecting the alignment with highest identity; call this identity V. Ideally, all sequences would have V=100%, indicating that they are identical to a reference sequence.
Each OTU sequence is assigned to exactly one of the
following six categories: Perfect (V=100%),
Good (100%>V>=99%), Noisy (99%>V>=97%), Chimeric, Contaminant or
Other. An OTU is classified as Chimeric if V<97% and the sequence is
chimeric according to uchime_ref or
uparse_ref by comparison with the Haas et al.
reference database. Some Good or Noisy sequences may also be chimeras, but
these are less likely to degrade analysis. In the case of
uparse_ref, it is required that the model is <=1%
different from the OTU in order to classify is a chimeric because parsimony is
less reliable when there are many inferred point mutations that may indicate a
missing reference sequence or make crossover points harder to detect. An OTU is
classified as a Contaminant if a MEGABLAST search of the NCBI nt database16
reports a hit with >=95% identity covering >=95% of the OTU sequence. The Other
category indicates that it was not possible to assign the OTU to any of the
previous categories, indicating an artifact with more than 3% errors or a novel
biological sequence.