See also
OTU / denoising
pipeline
Read preparation
Trimming fungal ITS reads
To get good OTU sequences, reads must be trimmed such that all sequences derived
from the same biological template start at the same
position in the gene and have the same length. I call
this "global trimming". This is required to
get a good measurement of unique
sequence abundances. Good abundances are needed by the
UPARSE algorithm (cluster_otus
command) and the UNOISE algorithm (unoise3
command) because they assume that high-abundance sequences are much more
likely to be correct biological sequences.
It is ok for reads of different biological sequences to have different lengths because of natural variation in the length of the gene or region. See trimming for fungal ITS.
With overlapping paired reads, length trimming as such is usually not
necessary because the reverse reads start at a primer-binding locus, and the
merged sequence therefore always ends at that locus. However, you should
still trim the primer sequences.
Length trimming may be needed if you have unpaired reads which vary in
length in the raw data files and / or have lower quality towards the 3'
ends, which is often the case with unpaired reads such as 454 or Ion
Torrent. You can choose an appropriate trim length using the
fastq_eestats2 command.
You can trim to a fixed length by using the fastx_truncate command. For example, somewhere around 250 is often a good chocie for 454 reads, which can be implemented like this:
usearch -fastx_truncate reads.fq -trunclength 250 -fastqout reads250.fq