All topics
Rarefaction curves and subsampling
Rarefaction
Abundance rarefaction
Jagged steps in the fast rarefaction curve
Sequence comparison and alignment
Definitions of pair-wise identity
Local and global alignment
E-values and Karlin-Altschul statistics
Sequence masking
Alignment parameters (gap penalties etc.)
Terminal and internal gaps
Chimeras
Chimeras overview
Abundance skew
Chimeras
UCHIME2, improved chimera search
Sequence database search
Translated search (nt query, a.a. database)
USEARCH, fast search by global alignment
UBLAST, fast search by local alignment
Clustering
UPARSE (OTU clustering for next-gen amplicon reads)
UCLUST, general-purpose sequence clustering
UCLUST sort order
Abundance sorting
Agglomerative clustering (single, complete and average linkage)
Connected components (single-linkage clustering)
Can UCLUST cluster metagenomics contigs?
Clustering sequences using local alignment
Read quality filtering
Quality filtering overview
Read quality filtering
Expected error filtering
Discarding singletons
Calculating average Phred (Q) scores is a bad idea
Choosing filter parameters
Setting the expected error threshold for long reads
Defining unique sequence abundances
Filtering by merged
Paired read assembly (merging pairs into contigs)
Paired read merging overview
Checking merged reads for problems
Paired read assemblers
Taxonomy
Which taxonomy reference database should you use?
Can you predict species from short reads?
Adding
Errors and biases in amplicon sequencing
OTU accuracy
Spurious OTUs in mock and real samples
Tolstoy's paradox, spurious clusters are common with low error rates
Abundance bias (read count does not correlate with species abundance)
Cross-talk
File formats
Commands to convert file formats
FASTA files
FASTQ files
FASTQ file format options
Phred (Quality) scores in FASTQ files
Calculating average Phred (Q) scores is a bad idea
Sequence labels
Annotations in sequence labels
Sequence database files
IUPAC codes for sequence wildcard letters N, X etc.
Why doesn't USEARCH support gzipped FASTQ files?
SAM files
CIGAR strings in SAM files
USEARCH Database (UDB) files
Mothur file formats
QIIME file formats
OTU table file formats
NAST multiple alignment files (nastout)
Newick tree format
Metadata file
Output files
Output files for search and clustering
RDP Naive Bayesian Classifier output format (rdpout)
Operational Taxonomic Units (OTUs)
Defining and interpreting OTUs
Mapping reads to OTUs
Closed-reference OTUs
Problems with closed- and open-reference OTUs
OTU accuracy
How to compare two sets of OTUs
How to make OTUs from different studies or sequencing runs
Spurious OTUs in mock and real samples
USEARCH, fast search by global alignment
UBLAST, fast search by local alignment
UPARSE, OTUs from next-gen amplicon reads
UNOISE, denoise / error-correction for amplicon reads
UPARSE-REF, amplicon read parsing by parsimony
SEARCH_16S, find 16S rRNA genes in chromosomes and contigs
SINAPS, predict traits from 16S sequence
UCLUST, general-purpose sequence clustering
UCHIME, chimera search
UCHIME2, improved chimera search
UNBIAS, correct abundance bias in OTU table
UNCROSS2, detect and filter cross-talk in OTU table
Closed-reference OTUs
RDP Naive Bayesian Classifier
Parameters and options
Output files for search and clustering
Accept options (which hits to report)
Termination options (when to stop
Alignment heuristics
Alignment parameters (gap penalties etc.)
Sequence database index parameters
Patterns (spaced seeds)
Compressed amino acid alphabets
Diversity
Diversity overview
Alpha diversity
Alpha diversity metrics
Beta diversity
Beta diversity metrics
Why use UniFrac?
Interpreting alpha and beta diversity
Recommended alpha and beta
Using diversity metrics to compare groups
Significance of diversity differences between groups
Jost's effective number of species
USEARCH software installation and use
Installing USEARCH
Command line
USEARCH binary file name
Memory use and 32 / 64-bit binaries
Installing multiple USEARCH versions
Known bugs
Commentaries and rebuttals
QIIME v1 grossly over-estimates mock community diversity
Analysis of Brown 2017 ITS mock data
Problems with CD-HIT sequence identities
Problems with closed- and open-reference OTUs
Westcott & Schloss MCC OTU metric
Rebuttal to Dr. P. Schloss's comments on OTU identity
Rebuttal to ARB / SILVA team's comments
Comments on Westcott & Schloss 2017 OptiClust
Benchmark tests
OTU accuracy
Chimeras
Cross-validation by identity
Read quality filtering
Paired read assemblers
Designing sequence search benchmarks
Publications and citing USEARCH
Publications
OTU analysis
Recommended procedures for OTU / denoising analysis
Tutorials, exercises and example scripts
Should I make OTUs using UPARSE (cluster_otus) or UNOISE (unoise3)?
Mapping reads to OTUs
Closed-reference OTUs
Why do I get more OTUs than ZOTUs (denoised sequences)?
Adding
Denoising
Recommended procedures for OTU / denoising analysis
Should I make OTUs using UPARSE (cluster_otus) or UNOISE (unoise3)?
Denoising non-Illumina reads (455, PacBio, Ion Torrent, Nanopore)
What is the "shifted sequences" warning?
Machine learning
Machine learning in OTU analysis
Random forests
OTU importance
Designing 16S experiments
Capture two V regions with 2x250 reads, not just V4
Downloads
Taxonomy reference databases
GG97 closed-reference database
Unbias reference databases