Home About Contact     
 

All topics


Rarefaction curves and subsampling
Rarefaction
Abundance rarefaction
Jagged steps in the fast rarefaction curve

Sequence comparison and alignment
Definitions of pair-wise identity
Local and global alignment
E-values and Karlin-Altschul statistics
Sequence masking
Alignment parameters (gap penalties etc.)
Terminal and internal gaps

Chimeras
Chimeras overview
Abundance skew
Chimeras
UCHIME2, improved chimera search

Sequence database search
Translated search (nt query, a.a. database)
USEARCH, fast search by global alignment
UBLAST, fast search by local alignment

Clustering
UPARSE (OTU clustering for next-gen amplicon reads)
UCLUST, general-purpose sequence clustering
UCLUST sort order
Abundance sorting
Agglomerative clustering (single, complete and average linkage)
Connected components (single-linkage clustering)
Can UCLUST cluster metagenomics contigs?
Clustering sequences using local alignment

Read quality filtering
Quality filtering overview
Read quality filtering
Expected error filtering
Discarding singletons
Calculating average Phred (Q) scores is a bad idea
Choosing filter parameters
Setting the expected error threshold for long reads
Defining unique sequence abundances
Filtering by merged

Paired read assembly (merging pairs into contigs)
Paired read merging overview
Checking merged reads for problems
Paired read assemblers

Taxonomy
Which taxonomy reference database should you use?
Can you predict species from short reads?
Adding

Errors and biases in amplicon sequencing
OTU accuracy
Spurious OTUs in mock and real samples
Tolstoy's paradox, spurious clusters are common with low error rates
Abundance bias (read count does not correlate with species abundance)
Cross-talk

File formats
Commands to convert file formats
FASTA files
FASTQ files
FASTQ file format options
Phred (Quality) scores in FASTQ files
Calculating average Phred (Q) scores is a bad idea
Sequence labels
Annotations in sequence labels
Sequence database files
IUPAC codes for sequence wildcard letters N, X etc.
Why doesn't USEARCH support gzipped FASTQ files?
SAM files
CIGAR strings in SAM files
USEARCH Database (UDB) files
Mothur file formats
QIIME file formats
OTU table file formats
NAST multiple alignment files (nastout)
Newick tree format
Metadata file

Output files
Output files for search and clustering
RDP Naive Bayesian Classifier output format (rdpout)

Operational Taxonomic Units (OTUs)
Defining and interpreting OTUs
Mapping reads to OTUs
Closed-reference OTUs
Problems with closed- and open-reference OTUs
OTU accuracy
How to compare two sets of OTUs
How to make OTUs from different studies or sequencing runs
Spurious OTUs in mock and real samples
USEARCH, fast search by global alignment
UBLAST, fast search by local alignment
UPARSE, OTUs from next-gen amplicon reads
UNOISE, denoise / error-correction for amplicon reads
UPARSE-REF, amplicon read parsing by parsimony
SEARCH_16S, find 16S rRNA genes in chromosomes and contigs
SINAPS, predict traits from 16S sequence
UCLUST, general-purpose sequence clustering
UCHIME, chimera search
UCHIME2, improved chimera search
UNBIAS, correct abundance bias in OTU table
UNCROSS2, detect and filter cross-talk in OTU table
Closed-reference OTUs
RDP Naive Bayesian Classifier

Parameters and options
Output files for search and clustering
Accept options (which hits to report)
Termination options (when to stop
Alignment heuristics
Alignment parameters (gap penalties etc.)
Sequence database index parameters
Patterns (spaced seeds)
Compressed amino acid alphabets

Diversity
Diversity overview
Alpha diversity
Alpha diversity metrics
Beta diversity
Beta diversity metrics
Why use UniFrac?
Interpreting alpha and beta diversity
Recommended alpha and beta
Using diversity metrics to compare groups
Significance of diversity differences between groups
Jost's effective number of species

USEARCH software installation and use
Installing USEARCH
Command line
USEARCH binary file name
Memory use and 32 / 64-bit binaries
Installing multiple USEARCH versions
Known bugs

Commentaries and rebuttals
QIIME v1 grossly over-estimates mock community diversity
Analysis of Brown 2017 ITS mock data
Problems with CD-HIT sequence identities
Problems with closed- and open-reference OTUs
Westcott & Schloss MCC OTU metric
Rebuttal to Dr. P. Schloss's comments on OTU identity
Rebuttal to ARB / SILVA team's comments
Comments on Westcott & Schloss 2017 OptiClust

Benchmark tests
OTU accuracy
Chimeras
Cross-validation by identity
Read quality filtering
Paired read assemblers
Designing sequence search benchmarks

Publications and citing USEARCH
Publications

OTU analysis
Recommended procedures for OTU / denoising analysis
Tutorials, exercises and example scripts
Should I make OTUs using UPARSE  (cluster_otus) or UNOISE (unoise3)?
Mapping reads to OTUs
Closed-reference OTUs
Why do I get more OTUs than ZOTUs (denoised sequences)?
Adding

Denoising
Recommended procedures for OTU / denoising analysis
Should I make OTUs using UPARSE  (cluster_otus) or UNOISE (unoise3)?
Denoising non-Illumina reads (455, PacBio, Ion Torrent, Nanopore)
What is the "shifted sequences" warning?

Machine learning
Machine learning in OTU analysis
Random forests
OTU importance

Designing 16S experiments
Capture two V regions with 2x250 reads, not just V4

Downloads
Taxonomy reference databases
GG97 closed-reference database
Unbias reference databases

1sco
Search the AlphaFold DB online in seconds >