Home Software Services About Contact     
 
Follow on twitter

Robert C. Edgar on twitter

11-Aug-2018 New paper describes octave plots for visualizing alpha diversity.

12-Jun-2018 New paper shows that one in five taxonomy annotations in SILVA and Greengenes are wrong.

18-Apr-2018 New paper shows that taxonomy prediction accuracy is <50% for V4 sequences.

05-Oct-2017 PeerJ paper shows low accuracy of closed- and open-ref. QIIME OTUs.

22-Sep-2017 New paper shows 97% threshold is wrong, OTUs should be 99% full-length 16S, 100% for V4.

24-Nov-2016
UPARSE tutorial video posted on YouTube. Make OTUs from MiSeq reads.

 

USEARCH v11

termination options

See also
  weak hits
  maxhits option.

The maxaccepts and maxrejects options
The termination options -maxaccepts and -maxrejects are supported by most search and clustering commands. These options cause the search for a given query sequence to stop if a given number of accepts (target sequences that meet the accept criteria) or rejects (target sequences that were processed but failed to meet those criteria) have occurred. Early search termination can give dramatic improvements in speed, often with minimal or no cost in sensitivity. See USEARCH algorithm for discussion of why "U-sorting" with termination is an effective speed optimization.

Other termination options
-termid terminate search when a target identity drops below the given value, specified as a fractional identity in range 0.0 to 1.0.

-termidd terminate when the difference (maxid - minid) exceeds the given value, when maxid (minid) is the maximum (minimum) identity found so far.

Comprehensive search
Roughly speaking, a search of the complete database is specified by disabling the maxaccepts and maxrejects termination options. This is done by setting -maxaccepts 0 -maxrejects 0. This is the default for the ublast command, but not for clustering and search based on the USEARCH algorithm. See table below for default values for each command. However, this is not strictly true: with commands based on the USEARCH and UBLAST algorithms, a database sequence will not be aligned if it has no words (or seeds) in common with the query sequence. For a truly comprehensive search, use search_global or search_local.

Discussion
Termination conditions are combined with OR, so the first one to be satisfied causes the search to stop. (Unlike accept criteria, which are combined with AND).

By default, termination options are enabled only for clustering and search commands based on the USEARCH algorithm. This is because USEARCH tests database sequences (targets) in order of decreasing number of words in common between the query and target sequence. This order correlates well with sequence similarity, so the best hit(s) are likely to be found quickly.

With ublast, search_local and search_global, targets are compared to the query in an order that does not correlate with sequence similarity or E-value. With these commands, the first accepted hit is not expected to be close to the best possible hit. However, termination options can still be useful; see weak hits for discussion and examples.

If maxaccepts is set to a value > 1, then more than one hit may be reported per query. In this case, it is usually recommended to increase maxrejects also, because it will often be necessary to search further into the list of candidate target sequences to find more than one hit.

The maxaccepts and maxrejects options can be used to tune speed against sensitivity. Smaller values of both parameters tend to improve speed by reducing the number of alignments that must be computed per query. For example, with cluster_fast, the default value of maxrejects is reduced from 32 to 8 in order to achieve higher speed. Increasing either value tends to result in slower execution because more alignments must be computed. Increasing maxrejects tends to improve sensitivity by reducing the number of false negatives, i.e. target sequences that would be accepted but are not tested because they are too far down the list in word-count order.

With translated searches, termination conditions apply to each ORF separately. This is because the nucleotide query sequence might span more than one gene.

Command Uses USEARCH? maxaccepts default
(0 = disabled)
maxrejects default
(0 = disabled)
usearch_global Yes 1 32
usearch_local Yes 1 32
cluister_smallmem Yes 1 32
cluster_fast Yes 1 8
ublast No 0 0
search_local No 0 0
search_global No 0 0
otutab Yes 4 (v10), 8 (v11) 64 (v10), 256 (v11)