Misits tutorial: Fungal ITS from pine needles

See also
OTU / denoising tutorial home

This tutorial uses data from study PRJEB7970 deposited by CEH at the European Nucleotide Archive derived from samples of Scots Pine (Pinus sylvestris) needles collected from forests and plantations in Scotland. Libraries were constructed using fITS7 (forward) and ITS4 (reverse) primers described in Ihrmark et al. (2012) targeting the 5 8S and LSU rRNA genes flanking the ITS2 region. Sequencing was done using MiSeq 2x300 PE. To reduce the dataset size for this tutorial, 5,000 reads were taken at random from each of the first ten pairs of FASTQ files.

Download tutorial scripts and precomputed results.
/downloads/misits_v10a.tar.gz

I'll assume your downloaded files are in ~/Downloads, if you downloaded to a different path then replace as needed below.

Make a top-level directory for the tutorials (see tutorial directories for description of subdirectories) and extract the data files from the archives.

mkdir -p ~/tutorials
cd ~/tutorials
tar -zxvf ~/Downloads/misits_v10a.tar.gz

Set the $usearch environment variable to the path name of your usearch binary file.

The run.bash script runs a basic OTU and denoising analysis. Run it from the scripts/ directory (commands below). Notice the dot and slash (./) before the script name. This tells the shell to look for the script file in your current directory (dot means current directory). Note that tutorial scripts always assume that scripts/ is your current directory.

cd ~/tutorials/misits/scripts
./run.bash

This should regenerate the pre-computed files in the misop/out and misop/out_mock directories.

Exercises

See the misits/exercises subdirectory for solutions.

1.Write a script to create alignments of OTU sequences to the taxonomy database using the usearch_global command. Use the reference database in the sintax/ directory of the tutorial. Use these options: -id 0.9 -strand both, and use the -alnout option to create human-readable alignments. Are the OTUs sequences on the plus strand, minus strand, or both?

2. Write a script that runs the sintax command to predict taxonomy for the OTU sequences. Use the -strand both option (why?). Use the sintax_summary command to generate a summary report at phylum rank. Which is the most common phylum in the OTUs?

3. Write a script that uses the cluster_fast command to cluster the ZOTU sequences at 97% identity. Use the -centroids option to create a FASTA file with the representative sequences. This is an alternative to cluster_otus for creating 97% OTUs. How many OTUs do the two methods create (hint: use grep -c "^>" to count the number of FASTA labels). Explain why one method creates more OTUs (hint: what is the minimum unique sequence abundance for each method?).