global trimming

Global trimming for ITS amplicon reads

ITS amplicons have large variations in length due to the biology of the region -- some of the sequence evolves neutrally, and long indels are common.

Paired Illumina reads
If you have paired Illumina reads which overlap then you probably don't need to do any global trimming. This is because the paired read merging will generate sequences that extend between the primers so are already effectively trimmed. You should make sure that the read length is long enough that all pairs overlap, even when the amplicon is long. If the read pairs don't overlap for longer amplicons, then you should take the forward reads only and treat them as unpaired as described below.

Unpaired reads
This is the strategy I currently recommend for global trimming for unpaired ITS reads.

1. Pick a fixed length which is as long as possible without losing a large fraction of the reads because they have expected errors > 1 (or your chosen e.e. threshold). The fastq_eestats2 command is useful for figuring out a good compromise. Call this length L_trim.

2. If a match to the reverse primer is present, then delete the matching letters and any letters after that.

3. Delete if the read is shorter than a reasonable length given your primer pair, then discard the read.

4. If the read is longer than L_trim, truncate to L_trim.

5. If the read is shorter than L_trim, pad with Ns so that it is L_trim letters.

Step 5 is needed because cluster_otus considers terminal gaps to be real differences. After this step, all your reads should now have length L_trim.

Steps 2 - 5 should be done before quality filtering by max e.e. You will need to write your own script to do this as usearch currently doesn't have commands with the necessary features. You can use the search_oligodb command to find the reverse primer matches.

Once you've pre-processed the reads to get them to the fixed length, proceed as usual to make UPARSE OTUs: quality filter, dereplicate, discard singletons, and run cluster_otus.

Finally, you'll need to strip the trailing Ns (added in step 5) from the OTU sequences.