See also
UNCROSS algorithm
UNCROSS paper
uncross command
Cross-talk
example (GAIIx)
Cross-talk example
(MiSeq)
Cross-talk errors assign reads to incorrect samples
In marker gene amplicon sequencing, samples are often multiplexed into a single
run by embedding index sequences into amplicons to identify the sample of
origin. Reads are assigned to samples (demultiplexed) according to their index
sequences. A cross-talk error occurs when a read is assigned to an incorrect
sample.
Illumina has a ~2% cross-talk error rate
The cross-talk
error rate was estimated to be ~2% in twelve Illumina datasets including
one single-indexed GAIIx run and
eleven dual-indexed MiSeq runs, as
described in the UNCROSS paper. In a given OTU,
the number of reads assigned to a single sample could be inflated by up to
~0.5% of the total reads in that OTU. Thus, if the OTU table shows that up
to around 0.5% of the reads were assigned to a given sample, the
correct count could be zero and this would then give a false-positive
identification of the species (or group of species) in the OTU. Cross-talk
thus tends to inflate estimates of richness and alpha diversity. Beta
diversity may also be inflated because samples may appear to share the same
spurious OTUs.
Filtering cross-talk
The
UNCROSS algorithm uses simple heuristics that attempt to identify and
filter cross-talk in an OTU table. Cross-talk can be identified most
reliably in control samples such as a null sample (e.g. water) and designed
(mock) communities where the sequences are known.