11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.



cluster_smallmem command

 See also

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to minimize memory use.

It's is the user's responsibility to sort the input sequences in an appropriate order before running cluster_smallmem; see UCLUST sort order for discussion. By default, input sequences are expected to be sorted by decreasing length. If some other sort order is used, the -sortedby option should be specified. Valid values are length (default), size and other. If -sortedby other is specified, then USEARCH does not assume or check for any particular order. See also sortbysize and sortbylength.

An identity threshold must be specified using the -id option.

Multithreading is not supported as this would require significant memory overhead.

By default, nucleotide matching is done on the forward strand only. For matching on both strands, use -strand both.

usearch -cluster_smallmem query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc