Home Software Services About Contact     
 
USEARCH v11

All topics


Rarefaction curves and subsampling
  Rarefaction
  Abundance rarefaction
  Jagged steps in the fast rarefaction curve

Sequence comparison and alignment
  Definitions of pair-wise identity
  Local and global alignment
  E-values and Karlin-Altschul statistics
  Sequence masking
  Alignment parameters (gap penalties etc.)
  Terminal and internal gaps

Chimeras
  Chimeras overview
  Abundance skew
  Chimeras
  UCHIME2, improved chimera search

Sequence database search
  Translated search (nt query, a.a. database)
  USEARCH, fast search by global alignment
  UBLAST, fast search by local alignment

Clustering
  UPARSE (OTU clustering for next-gen amplicon reads)
  UCLUST, general-purpose sequence clustering
  UCLUST sort order
  Abundance sorting
  Agglomerative clustering (single, complete and average linkage)
  Connected components (single-linkage clustering)
  Can UCLUST cluster metagenomics contigs?
  Clustering sequences using local alignment

Read quality filtering
  Quality filtering overview
  Read quality filtering
  Expected error filtering
  Discarding singletons
  Calculating average Phred (Q) scores is a bad idea
  Choosing filter parameters
  Setting the expected error threshold for long reads
  Defining unique sequence abundances
  Filtering by merged

Paired read assembly (merging pairs into contigs)
  Paired read merging overview
  Checking merged reads for problems
  Paired read assemblers

Taxonomy
  Which taxonomy reference database should you use?
  Can you predict species from short reads?
  Adding

Errors and biases in amplicon sequencing
  OTU accuracy
  Spurious OTUs in mock and real samples
  Tolstoy's paradox, spurious clusters are common with low error rates
  Abundance bias (read count does not correlate with species abundance)
  Cross-talk

File formats
  Commands to convert file formats
  FASTA files
  FASTQ files
  FASTQ file format options
  Phred (Quality) scores in FASTQ files
  Calculating average Phred (Q) scores is a bad idea
  Sequence labels
  Annotations in sequence labels
  Sequence database files
  IUPAC codes for sequence wildcard letters N, X etc.
  Why doesn't USEARCH support gzipped FASTQ files?
  SAM files
  CIGAR strings in SAM files
  USEARCH Database (UDB) files
  Mothur file formats
  QIIME file formats
  OTU table file formats
  NAST multiple alignment files (nastout)
  Newick tree format
  Metadata file

Output files
  Output files for search and clustering
  RDP Naive Bayesian Classifier output format (rdpout)

Operational Taxonomic Units (OTUs)
  Defining and interpreting OTUs
  Mapping reads to OTUs
  Closed-reference OTUs
  Problems with closed- and open-reference OTUs
  OTU accuracy
  How to compare two sets of OTUs
  How to make OTUs from different studies or sequencing runs
  Spurious OTUs in mock and real samples
  USEARCH, fast search by global alignment
  UBLAST, fast search by local alignment
  UPARSE, OTUs from next-gen amplicon reads
  UNOISE, denoise / error-correction for amplicon reads
  UPARSE-REF, amplicon read parsing by parsimony
  SEARCH_16S, find 16S rRNA genes in chromosomes and contigs
  SINAPS, predict traits from 16S sequence
  UCLUST, general-purpose sequence clustering
  UCHIME, chimera search
  UCHIME2, improved chimera search
  UNBIAS, correct abundance bias in OTU table
  UNCROSS2, detect and filter cross-talk in OTU table
  Closed-reference OTUs
  RDP Naive Bayesian Classifier

Parameters and options
  Output files for search and clustering
  Accept options (which hits to report)
  Termination options (when to stop
  Alignment heuristics
  Alignment parameters (gap penalties etc.)
  Sequence database index parameters
  Patterns (spaced seeds)
  Compressed amino acid alphabets

Diversity
  Diversity overview
  Alpha diversity
  Alpha diversity metrics
  Beta diversity
  Beta diversity metrics
  Why use UniFrac?
  Interpreting alpha and beta diversity
  Recommended alpha and beta
  Using diversity metrics to compare groups
  Significance of diversity differences between groups
  Jost's effective number of species

USEARCH software installation and use
  Installing USEARCH
  Command line
  USEARCH binary file name
  Memory use and 32 / 64-bit binaries
  Installing multiple USEARCH versions
  Known bugs

Commentaries and rebuttals
  QIIME v1 grossly over-estimates mock community diversity
  Analysis of Brown 2017 ITS mock data
  Problems with CD-HIT sequence identities
  Problems with closed- and open-reference OTUs
  Westcott & Schloss MCC OTU metric
  Rebuttal to Dr. P. Schloss's comments on OTU identity
  Rebuttal to ARB / SILVA team's comments
  Comments on Westcott & Schloss 2017 OptiClust

Benchmark tests
  OTU accuracy
  Chimeras
  Cross-validation by identity
  Read quality filtering
  Paired read assemblers
  Designing sequence search benchmarks

Publications and citing USEARCH
  Publications

OTU analysis
  Recommended procedures for OTU / denoising analysis
  Tutorials, exercises and example scripts
  Should I make OTUs using UPARSE  (cluster_otus) or UNOISE (unoise3)?
  Mapping reads to OTUs
  Closed-reference OTUs
  Why do I get more OTUs than ZOTUs (denoised sequences)?
  Adding

Denoising
  Recommended procedures for OTU / denoising analysis
  Should I make OTUs using UPARSE  (cluster_otus) or UNOISE (unoise3)?
  Denoising non-Illumina reads (455, PacBio, Ion Torrent, Nanopore)
  What is the "shifted sequences" warning?

Machine learning
  Machine learning in OTU analysis
  Random forests
  OTU importance

Designing 16S experiments
  Capture two V regions with 2x250 reads, not just V4

Downloads
  Taxonomy reference databases
  GG97 closed-reference database
  Unbias reference databases