USEARCH manual > cluster sizes |
cluster sizes |
See also: smaller cluster sizes in v6. In some applications, sequences are clustered in two or two or more passes by different USEARCH commands and/or by other programs. Sometimes, the size of a cluster is required in terms of the number of sequences that were provided to the first stage of a pipeline. For example, 16S reads might dereplicated then clustered at 97% by cluister_smallmem. To handle multi-step clustering, USEARCH provides a mechanism to propagate cluster size annotations. If the ‑sizein option is specified, input sequences are required to have a size annotation. If the ‑sizeout option is specified, size annotations are added to the output labels. If both -sizein and -sizeout are given, then the output size for a cluster takes into account the input sizes. If the -sizein option is not specified, input sizes default to 1. Typical use is: 1. First clustering or dereplication step in the
pipeline uses -sizeout. |