USEARCH manual

Misits tutorial
MiSeq 2x300 PE reads, fungal ITS2

UPARSE pipeline from reads to OTU table
This tutorial uses data from study PRJEB7970 deposited by CEH at the European Nucleotide Archive derived from samples of Scots Pine (Pinus sylvestris) needles collected from forests and plantations in Scotland. Libraries were constructed using fITS7 (forward) and ITS4 (reverse) primers described in Ihrmark et al. (2012) targeting the 5 8S and LSU rRNA genes flanking the ITS2 region. Sequencing was done using MiSeq 2x300 PE. To reduce the dataset size for this tutorial, 5,000 reads were taken at random from the first ten pairs of FASTQ files.

Download this archives:
Tutorial files: misits.tar.gz

I'll assume your downloaded files are in ~/Downloads, if you downloaded to a different path then replace as needed below.

Make a top-level directory for the tutorials, change to that directory and extract the data files using tar for the tutorial files. See tutorial directories for description of subdirectories.

Extract the data files:

mkdir -p ~/tutorials
cd ~/tutorials
tar -zxvf ~/Downloads/misits.tar.gz

Set the $usearch environment variable to the path name of your usearch binary file.

Create the utax database by running the setup_utax.bash script, like this.

cd ~/tutorials/misits/scripts
./setup_utax.bash

Notice the dot and slash (./) before setup_utax.bash. This tells the shell to look for the command file (script or binary) in your current directory (dot means current directory). This is needed if the current directory is not in your PATH. Tutorial scripts always assume that they are being run like this, i.e. from inside the scripts/ subdirectory.

The setup_utax.bash script uses curl to fetch the data. Some systems don't have curl in which case you can use wget. There is a wget command in the script which is commented out so it's a simple edit of the script to comment out curl instead.

The run_uparse.bash script executes the UPARSE pipeline. Run it like this:

cd ~/tutorials/misits/scripts
./run_uparse.bash

This should reproduce the pre-computed files in the misits/out directory.

The most important output files are the OTU tables, which are named outtab.txt (QIIME classic format), otutab.json (BIOM format) and otutab.mothur (mothur "shared" file format) . You can use these to perform further analysis in QIIME or mothur.