Comments
Westcott and Schloss fail to consider several popular methods, including UPARSE and denoisers. It seems clear to me that denoising (error-correction) should be the preferred pre-processing step for OTU clustering because otherwise the input data for the clustering algorithm will have noise due to PCR and sequencing in addition to biological variation. However, there is no mention of denoising in the paper.
I attempted to reproduce the authors' read pre-processing protocol, and found that it failed to filter many bad reads and chimeras on mock community data. Clustering to define OTUs is a very different problem from clustering to account for noise, and in my opinion these should be considered separately. The authors do not explain why it is informative to use noisy data to test clustering algorithms which assume or require error-free input.
With these considerations in mind, I found that OptiClust generated >5,000 OTUs on reads of a mock community with 22 strains, after making my best effort to pre-process the reads according to the procedures described in the paper.
The authors implicitly propose rules for defining OTUs (which should have been made explicit and discussed), but their rules are impossible to satisfy on real data which calls their conceptual approach into question. By contrast, the UPARSE clustering rules, which also construct 97% OTUs, can always be satisfied on real data.
The OptiClust algorithm constructs OTUs by seeking to maximize the Matthews Correlation Coefficient (MCC). Their benchmark tests also use MCC as an accuracy metric, which are therefore strongly biased towards OptiClust.
MCC is not universally accepted -- on the contrary, I have not been able to find any papers from outside the Schloss lab which use MCC to define or assess OTUs. From my own perspective, I do not agree that MCC is a good definition and I believe that the UPARSE clustering criteria are better. Also, MCC fails in some common cases. Therefore, in my opinion MCC is not justified as a gold standard.
It is not clear to me from the paper whether W&S consider unique sequence abundance in calculating MCC in OptiClust or for benchmarking, but both choices have problems.
Different programs report different identities for a given pair of sequences, so using one program's measurements of identity (here, mothur), as a gold standard for benchmarking other programs would cause bias in favor of mothur even if the programs were all designed to maximize MCC, which is not the case. Pair-wise sequence Identities from mothur are especially dubious because it uses the NAST algorithm which intentionally introduces alignment errors to preserve the number of columns in a multiple alignment.
W&S say: "Several metrics have emerged for assessing the quality of OTU assignment algorithms... Unfortunately, these methods fail to directly quantify the quality of the OTU assignments." I agree with this critique of the papers they cite. However, they fail to mention other approaches (e.g. Ye, 2010, Edgar, 2013, Callahan et al. 2016, Edgar 2016) which directly quantify OTU quality on mock communities by measuring the number of errors in a representative sequence for each OTU and identifying OTUs which contain sequences which are due to
chimeras, contaminants and cross-talk.
Reference
Westcott SL and Schloss PD. 2017, OptiClust, an improved method for assigning amplicon-based sequence data to operational taxonomic units, mSphere 2:e00073-17. https://doi.org/10.1128/mSphereDirect.00073-17.