Home Software Services About Contact

The SINTAX algorithm predicts taxonomy by using k-mer similarity to identify the top hit in a reference database and provides bootstrap confidence for all ranks in the prediction. SINTAX achieves similar or better accuracy to the RDP Naive Bayesian Classifier with a simpler algorithm that does not require training. In particular, SINTAX has significantly lower false positive rates on full-length 16S and ITS sequences due to a lower over-classification rate,

SINTAX is supported by the sintax command in USEARCH v9.

Reference databases
FASTA files reformatted with SINTAX-compatible taxonomy annotations.

   rdp_16s_v16.fa.gz RDP training set v16 (13k seqs.). RDP license terms
   rdp_16s_v16_sp.fa.gz RDP training set with species names (can species be predicted?).
   gg_16s_13.5.fa.gz Greengenes v13.5 (1.2M seqs.). Greengenes license terms
   silva_16s_v123.fa.gz SILVA v123 (1.6M seqs.). SILVA license terms
   ltp_16s_v123.fa.gz SILVA v123 LTP named isolate subset (12k seqs.) .SILVA license terms

    UNITE (current version at unite.ut.ee) (53k sequences in v7.1). UNITE license terms
    rdp_its_v2.fa.tz RDP Warcup training set v2 (18k sequences). RDP license terms