Home Software Services About Contact     
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



Quality control for OTU sequences

See also
OTU / denoising analysis
  Defining and interpreting OTUs
  Control samples

Amplicon reads often contain artifacts which are not filtered by my recommended pipeline because they vary widely in different datasets and it would be difficult to account for all of them in a single set of commands. It is generally easier to identify them by manually analyzing the OTU sequences rather than the reads because of the much smaller size of the dataset. Of course, if you are going to repeatedly run a pipeline with reads obtained from similar libaries, it would make sense to modify the pipeline to filter the types of artifact you find.

Here, I describe qualilty control checks that I use in my own work with links to discussion and commands. If you encounter other artifacts in your data, please let me know and I will update this page.

See control samples for discussion of how to use controls to better understand your data.

Issue   Description
Alignments   Do the OTU sequences align well to a reference database for your gene?
Missing OTUs   Do all OTUs appear in the OTU table?
Coverage   How much of the data is explained by the OTUs?
Short contructs   Bad sequencing construct created by PCR
Strand duplicates   Sequences of both plus and minus strands
Offsets   Sequences start at different positions in the gene
Cross-talk   Reads assigned to the wrong sample.
Sequence error   Polymerase errors and bad base calls
Low complexity   Sequencer noise
PhiX   Unfiltered spike-in
Chimeras   Unfiltered PCR chimeras
Mistargeting   Primers amplify a different region
Contaminants   Self-explanatory
Primers   Primer-binding sequences should be stripped at the start of the pipeline
Tight OTUs   OTUs >97% identical