UNBIAS algorithm

**See also **unbias command

UNBIAS paper

Abundance bias

The UNBIAS algorithm attempts to adjust an OTU table to correct for the two sources of abundance bias I believe to be most important in practice: 16S copy number and primer mismatches. This requires predicting the copy number and mismatch number for each OTU sequence, then adjusting the read counts accordingly.

Prediction of copy number and primer mismatches is done by the SINAPS algorithm. SINAPS is based on essentially the same algorithm as SINTAX. The top hit in a reference database is identified using k-mer similarity. Confidence is estimated by bootstrapping. In each bootstrap iteration, a subset of k-mers is selected and used to find the top hit and the trait of interest (here, copy number or primer mismatches) is taken from reference sequence annotation. The trait with highest bootstrap frequency is reported as the prediction, and the frequency with which it occurred is reported the bootstrap confidence. UNBIAS reqyuires a prediction for every OTU, so the bootstrap confidence is ignored. In this case, SINAPS is effectively equivalent to finding the top database hit using the USEARCH algorithm.

If the predicted 16S copy number is

If the predicted number of primer mismatches is