Home Software Services About Contact     
 
USEARCH v11

cluster_fast command

 
See also
 
cluster_smallmem
  cluster_otus
  cluster_agg
  cluster_aggd

Clusters sequences in a FASTA or FASTQ file using a variant of the UCLUST algorithm designed to maximize speed.

An identity threshold must be specified using the -id option.

Sequences are processed in the order specified by the -sort option, which may be other (the default), length or size. See UCLUST sort order for discussion. If -sort length is specified, then sequences are processed in order of decreasing length. This is most appropriate when fragments are present together with full-length sequences. If -sort size is specified, then sequences are processed in order of decreasing size annotation. This can be useful for clustering of amplicon reads such as 16S or ITS tags, though cluster_otus is usually recommended for this task. If -sort other is used (the default), then the input sequences are processed in the order they appear in the input file.

Reverse-complemented matching for nucleotide sequences can be specified by using -strand both.

Size annotations may be generated and/or propagated by using the -sizein and/or -sizeout options.

Output files
Standard output files are supported. Cluster centroids (representative sequences) are written to a FASTA file specified by the -centroids option. Consensus sequences are written to a FASTA file specified by -consout and multiple alignments are written to filenames derived from the -msaout option. Note that using -consout and -msaout may add significantly to the compute time and memory required for clustering. You can specify a directory to contain one FASTA file per cluster using the -clusters option.

Supported options
 
Accept options
  Termination options
  Indexing options
  Masking options
  Multithreading
  Alignment parameters
  Alignment heuristics


Example

usearch -cluster_fast query.fasta -id 0.9 -centroids nr.fasta -uc clusters.uc