ITS amplicons have large variations in length due to the biology of the region -- some of the sequence evolves neutrally, and long indels are common.
Paired Illumina reads
If you have paired
Illumina reads which overlap then you probably don't need to do any global
trimming. This is because the paired read merging will generate sequences
that extend between the primers so are already effectively trimmed. You
should make sure that the read length is long enough that all pairs overlap,
even when the amplicon is long. If the read pairs don't overlap for longer
amplicons, then you should take the forward reads only and treat them as
unpaired as described below.
Unpaired reads
This is the strategy I currently recommend for global
trimming for unpaired ITS reads.
1. Pick a fixed length which is as long as possible
without losing a large fraction of the reads because they have expected errors >
1 (or your chosen e.e. threshold). The
fastq_eestats2 command is useful for figuring out a good compromise. Call
this length L_trim.
2. If a match to the reverse primer is present, then delete the matching letters
and any letters after that.
3. Delete if the read is shorter than a reasonable length given your primer
pair, then discard the read.
4. If the read is longer than L_trim, truncate to L_trim.
5. If the read is shorter than L_trim, pad with Ns so that it is L_trim letters.
Step 5 is needed because cluster_otus considers terminal
gaps to be real differences. After this step, all your reads should now have
length L_trim.
Steps 2 - 5 should be done before quality filtering by max e.e. You will need to
write your own script to do this as usearch currently doesn't have commands with
the necessary features. You can use the search_oligodb command to find the
reverse primer matches.
Once you've pre-processed the reads to get them to the fixed length, proceed as
usual to make UPARSE OTUs: quality filter, dereplicate, discard singletons, and
run cluster_otus.
Finally, you'll need to strip the trailing Ns (added in
step 5) from the OTU sequences.