USEARCH manual

Misop tutorial
MiSeq 2x250 PE reads, mouse feces and mock community samples

Part 1: UPARSE pipeline (reads to OTU table)
Part 2: Mock community analysis (stringent)
Part 3: Mock community analysis (less stringent)
Part 4: Read quality and error rate analysis

Part 3. Mock community analysis with singletons and lower-quality reads included
The misop/out/mock1 directory contains output files generated with singletons included and a maximum expected error threshold of 2.0 instead of the recommended 1.0. This is to make sure that the three low-quality reads of P.acnes we found in Part 2 will create an OTU. These results are generated by the run_mock1.bash script.

Singletons are included by changing the -maxsize option of cluster_otus from 1 to 2. The expected error threshold is changed by setting the ‑fastq_maxee option of fastq_filter to 2.0.

Compare the otus.uparseref file in out/mock1 to the file we saw in out/mock (discussed in Part 2 of this tutorial). We still see 19 perfect OTUs. We now have a new good OTU for P.acnes with 99.6% identity to the reference sequence, as intended.

However, we also get 19 additional OTUs which are classified as "other" by uparse_ref because they are not close enough to the reference sequences. For example:

Otu20 other 94.1 94.1 P.aeruginosa.1
Otu22 other 80.6 82.5 P.gingivalis.1
Otu23 other 95.7 95.7 B.cereus.1
Otu25 other 80.6 80.6 P.gingivalis.1

What are these OTUs? Where do they come from? Most of them appear to be good biological sequences with at most one or two incorrect bases: 16 out of 19 are >99% identical to a Greengenes sequence. As with the Pseudomonas OTU we found in Part 1, these appear to be contaminants which may be caused by sample cross-talk. So we should not conclude that the algorithms have a problem because we got 39 OTUs on a mock community with 21 species -- the algorithms cannot correct for cross-talk errors.

However, in other cases I have found that it is better to cluster with the more stringent parameters because otherwise the number of spurious OTUs due to read errors and undetected chimeras increases. I think it is always a good idea to sequence a mock community and thoroughly understand the results before making a final decision on what parameters to use on your "real" samples. Another useful exercise is to check the error rates after quality filtering. This is discussed in Part 4.

Part 4. Mock community read quality and error rate analysis >