Publications

R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases , PeerJ 6:e5030
• Approx. one in five SILVA and Greengenes taxonomy annotations are wrong
• SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies

R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences , PeerJ 6:e4652
• Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates
• Genus accuracy of best methods is 50% on V4 sequences
• Recent algorithms do not improve on RDP Classifier or SINTAX

R.C. Edgar and H. Flyvbjerg (2018), Octave plots for visualizing diversity of microbial OTUs , https://doi.org/10.1101/389833
• Octave plots visualize alpha diversity as a histogram
• Plots show shape and completeness of distribution

R.C. Edgar (2018), UNCROSS2: identification of cross-talk in 16S rRNA OTU tables , https://doi.org/10.1101/400762
• Cross-talk rate is approx. 1% in many Illumina datasets
• Cross-talk can cause false positive core microbiome
• UNCROSS2 algorithm for filtering cross-talk

R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs , PeerJ 5:e3889
• QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs
• Closed-reference OTU assignment splits strains and species even when no sequence errors
• Closed-reference fails to assign different hyper-variable regions to the same OTU
• Closed-reference discards many well-known species that are present in Greengenes

R.C. Edgar (2017), SEARCH_16S: A new algorithm for identifying 16S ribosomal RNA genes in contigs and chromosomes , https://doi.org/10.1101/124131

R.C. Edgar (2017), SINAPS: Prediction of microbial traits from marker gene sequences , https://doi.org/10.1101/124156

R.C. Edgar (2017), "UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success" , https://doi.org/10.1101/124149
• Read abundance has very low correlation with species abundance
• Bias caused by gene copy count variation and primer mismatches
• Gene copy count and primer mismatches cannot be accurately predicted
• Impossible to correct abundance bias

R.C. Edgar (2017), Updating the 97% identity threshold for 16S ribosomal RNA OTUs , Bioinformatics 34(14) 2371-2375
• Standard 97% OTU identity threshold is too low
• Optimal OTU threshold is 99% for full-length 16S, 100% for V4

R.C. Edgar (2016), UNCROSS: Filtering of high-frequency cross-talk in 16S amplicon reads , https://doi.org/10.1101/088666
• Cross-talk is common, many are reads assigned to wrong sample
• UNCROSS algorithm for filtering cross-talk

R.C. Edgar (2016), UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing , https://doi.org/10.1101/081257
• UNOISE2 algorithm, improved denoiser
• Reduces false-positive chimeras compared to UNOISE and DADA2

R.C. Edgar (2016), UCHIME2: improved chimera prediction for amplicon sequencing , https://doi.org/10.1101/074252
• UCHIME2 algorithm, improved chimera detection
• "Fake" chimeras are common, valid biological sequences matching two-parent model
• Perfect chimera filtering impossible even with complete and correct reference
• Realistic chimera benchmark

R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences , https://doi.org/10.1101/074161
• SINTAX taxonomy prediction algorithm
• Fast and simple method, accuracy comparable to RDP Classifier

R.C. Edgar and H. Flyvbjerg (2015), "Error filtering, pair assembly and error correction for next-generation sequencing reads" , Bioinformatics 31(21) 3476-3482
• Quality filtering by expected errors
• Bayesian paired read assembler
• Most paired read assemblers calculate incorrect Q scores
• UNOISE algorithm, first denoiser for Illumina reads

R.C. Edgar et al. (2014), UCHIME improves sensitivity and speed of chimera detection , Bioinformatics 27(16) 2194-2200
• Shows UCHIME faster and more accurate than ChimeraSlayer
• This paper report misleading benchmark tests, see critique in UCHIME2 paper

R.C. Edgar (2013), UPARSE: highly accurate OTU sequences from microbial amplicon reads , "Nat. Meth. 10, 996-998"
• Describes UPARSE algorithm for 97% OTU clustering
• Stringent error filtering and discarding singletons necessary
• Highly accurate OTUs from paired OTUs without full overlap

R.C. Edgar (2010), Search and clustering orders of magnitude faster than BLAST , Bioinformatics 26(19) 2460-2461
• USEARCH algorithm
• Default citation for USEARCH software