An open reading frame (ORF) is a segment of a nucleotide sequence that begins with a start codon, ends with a stop codon and is long enough to code for a protein. In USEARCH, the minimum number of amino acid codons in an ORF is set by the ‑mincodons option, default value 20. With a nucleotide query sequence and amino acid database, USEARCH performs a translated search. ORFs are identified in the nucleotide sequence, and each ORF is treated as a separate query with its own termination conditions. This is because a single nucleotide sequence may span more than one gene. The most common application of translated
search is to find protein-coding genes in shotgun reads. With
shotgun, the read may span only part of an ORF, in which case the
start and/or end codons may be missing. USEARCH therefore supports
more flexible definitions of an ORF, controlled by the -orfstyle option. |