propagating cluster sizes
In some applications, sequences are clustered in two or two or more
passes by different USEARCH commands and/or by other programs.
Sometimes, the size of a cluster is required in terms of the number
of sequences that were provided to the first stage of a pipeline.
For example, 16S reads might dereplicated then clustered into OTUs by
multi-step clustering, USEARCH provides a mechanism to propagate cluster
size annotations. If the ‑sizein option is
specified, input sequences are required to have a size annotation.
If the ‑sizeout option is specified, size annotations are added to
the output labels. If both -sizein and -sizeout are given, then the
output size for a cluster takes into account the input sizes.
Typical use is:
1. First clustering or dereplication step in the
pipeline uses -sizeout.
2. Subsequent clustering steps use both -sizein and -sizeout.
If another program is used before the first USEARCH step, then it is up to you
to write scripts to produce size annotations for