Video talks on 16S data analysis posted.

URMAP ultra-fast read mapper posted (paper).

~20% of taxonomy annotations in SILVA and Greengenes are wrong (paper).

Taxonomy prediction is <50% accurate for 16S V4 sequences (paper).

97% OTU threshold is wrong for species, should be 99% for full-length 16S, 100% V4 (paper).

propagating cluster sizes

In some applications, sequences are clustered in two or two or more passes by different USEARCH commands and/or by other programs. Sometimes, the size of a cluster is required in terms of the number of sequences that were provided to the first stage of a pipeline. For example, 16S reads might dereplicated then clustered into OTUs by cluister_otus.

To handle multi-step clustering, USEARCH provides a mechanism to propagate cluster size annotations. If the -sizein option is specified, input sequences are required to have a size annotation. If the -sizeout option is specified, size annotations are added to the output labels. If both -sizein and -sizeout are given, then the output size for a cluster takes into account the input sizes.

Typical use is:

1. First clustering or dereplication step in the pipeline uses -sizeout.

2. Subsequent clustering steps use both -sizein and -sizeout.

If another program is used before the first USEARCH step, then it is up to you to write scripts to produce size annotations for USEARCH.