Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11
 New in v11 

otutab_core command

See also
  Dominant OTUs
  Cross-talk

Identifies a possible "core microbiome" of OTUs which are present in more samples than others.

Input is an OTU table in QIIME classic format.

The presence of an OTU in some or many samples can be spurious because of cross-talk or because the OTU itself is spurious. To enable manual review, the otutab_core command generates a report indicating cases where the presence of an OTU may be spurious due to cross-talk, and where an OTU may be spurious due to sequence errors.

If a sintax tabbed file is provided using the -sintaxin option, then the taxonomy of the core OTUs is included in the report.

If a distance matrix is provided using the distmxin option, this is used to identify possible dominant OTUs, i.e. high-abundance OTUs which are similar to a low-abundance OTUs in the report. If there is a dominant OTU, this may indicate that the low-abundance OTU is spurious.

The -tabbedout option specifies the output file. OTUs are sorted in order of decreasing number of samples where they are present. Fields are:

#1. OTU = name of the OTU.
#2. Samples = number of samples where the OTU has a non-zero count.
#3. Size = total number of reads assigned to this OTU.
#4. DomOTU = high-abundance "dominant" OTU which is very similar to this OTU, if any.
#5. DomSize = total number of reads assigned to the dominant OTU.
#6. DomId = identity of the dominant OTU with this OTU.
#7. Min = minimum count for this OTU.
#8. LoQ = low quartile count for this OTU.
#9. Med = median count for this OTU.
#10. HiQ = high quartile count for this OTU.
#11 Max = maximum count for this OTU.
#12 Taxonomy = condensed taxonomy prediction.

If the minimum or LoQ count is much smaller than the maximum count, this suggests that the smaller counts may be due to cross-talk.

If the size of an OTU is much smaller than a neighboring "dominant" OTU, then the OTU itself may be spurious due to sequence error.

Example

usearch -calc_distmx otus.fa -tabbedout distmx.txt \
  -sparsemx_minid 0.9 -termid 0.8

usearch -sintax otus.fa -strand both -db ref16s.txt \
  -tabbedout sintax.txt

usearch -otutab_core otutab.txt -distmxin distmx.txt \
  -sintaxin sintax.txt -tabbedout core.txt