Home Software Services About Contact     

Does MCC consider unique sequence abundance?

See also
Matthews Correlation Coefficient (MCC)
Commentary on Westcott & Schloss 2017

It is not clear to me from the paper whether Westcott and Schloss consider the abundance of a unique sequence in the OptiClust algorithm or when calculating MCC for benchmarking.

This makes a big difference in practice, because MCC considers pairs of sequences, and the number of pairs increases quadratically with abundance. The numerical values with and without abundance may therefore be very different with typical data, which could cause cause the clustering solution(s) which maximize MCC to be quite different. Both choices have drawbacks.

Abundance is retained
If abundance is retained, and two or more OTUs are >97%, then MCC will favor assignment to the OTU with highest abundance because this minimizes the number of FNs. This is a bad strategy because identity is a better indication that two sequences belong to the same species.

Abundance is discarded
If abundance is not considered, then all sequences are weighted equally by MCC. This is a bad strategy because a large majority of unique read sequences have errors, even after quality filtering.