Home Software Services About Contact     

FAQ: What is the "shifted sequences detected" warning?

Denoising (unoise3 command) reports the "Shifted sequences detected" warning when two different unique sequences are found to have 100% identity. The most common cause of this is short terminal gaps as in these two example alignments:



If all amplicons are created by the same primer pair, or by degenerate primers that bind to the same locus, then terminal gaps should not occur between very similar sequences.

Denoising assumes that the unique sequences are globally trimmed; the warning is issued because terminal gaps can indicate that the global trimming was not done correctly.

However, they are not always a problem -- short terminal gaps can occur due to indels caused by equencing error or PCR error, and the unoise3 command is designed to deal with these by discarding a low-abundance unique if it is 100% identical to a high-abundance unique sequence using the standard definition of sequence identity. If this is the cause of the warning, then you can safely ignore it.

With Illumina sequencing, such indels are rare, but you only need one case out of millions of reads to see the warning.

Longer terminal gaps occur due to offset sequences. In this case, global trimming was not done correctly and it is not safe to ignore the problem because you will get two or more denoised sequences (ZOTUs) for the same biological sequence. Here, you should modify your pipeline to perform correct global trimming. See quality control for shifted sequences for discussion.

Another cause of the warning is wildcard letters (e.g., N) in the unique sequences. If a wildcard letter matches a non-wildcard letter, then the two sequences have 100% identity by the standard definition, but are different by the definition used by fastx_uniques. You can check for wildcard letters using grep, e.g.:

grep -v "^>" uniques.fa | grep -iv "[ACGT]"

You should delete sequences with wildcards before denoising. You can use the fastq_filter command with -fastq_maxns 0 to remove them from the FASTQ files.

To determine the cause of the warning, look at alignments for the shifted sequences. Use the -tabbedout file generated by unoise3. There will be a field "shifted" for sequences which were found to be shifted relative to a high-abundance sequence. You can find them as follows:

usearch -unoise3 uniques.fa -zotus zotus.fa -tabbedout unoise3.txt

Find the shifted sequences and align them to the ZOTUs:

grep shifted unoise3.txt | cut -f1 > labels.txt

usearch -fastx_getseqs zotus.fa -labels labels.txt -fastaout shifted.fa

usearch -usearch_global shifted.fa -db zotus.fa -strand plus -id 1.0 -alnout aln.txt