USEARCH manual

Misop tutorial
MiSeq 2x250 PE reads, mouse feces and mock community samples

Part 1: UPARSE and UNOISE pipelines (reads to OTU table)
Part 2: Mock community analysis (stringent)
Part 3: Mock community analysis (less stringent)
Part 4: Read quality and error rate analysis

Part 1. UPARSE and UNOISE pipelines, from reads to OTU table
This tutorial uses data from the mothur MiSeq SOP (citation Kozich et al. 2013).

Download these two archives:
1. Tutorial files: misop_v9.tar.gz
2. Reads and reference data files: MiSeqSOPData.zip.

I'll assume your downloaded files are in ~/Downloads, if you downloaded to a different path then replace as needed below.

Make a top-level directory for the tutorials, change to that directory and extract the data files using tar for the tutorial files. See tutorial directories for description of subdirectories.

Extract the data files from the archives:

mkdir -p ~/tutorials
cd ~/tutorials
tar -zxvf ~/Downloads/misop_v9.tar.gz
unzip MiSeqSOPData.zip

Make the misop/fq directory and move the FASTQ files into it.:

cd ~/tutorials
mkdir misop/fq
mv MiSeqSOP/*.fastq misop/fq

Set the $usearch environment variable to the path name of your usearch binary file.

Install the sintax referennce database by running the setup_sintax.bash script, like this.

cd ~/tutorials/misop/scripts
./setup_sintax.bash

Notice the dot and slash (./) before setup_sintax.bash. This tells the shell to look for the command file (script or binary) in your current directory (dot means current directory). This is needed if the current directory is not in your PATH. Tutorial scripts always assume that they are being run like this, i.e. from inside the scripts/ subdirectory.

The setup_sintax.bash script uses curl to fetch the data. Some systems don't have curl in which case you can use wget. There is a wget command in the script which is commented out so it's a simple edit of the script to comment out curl instead.

There are three scripts which run UPARSE and UNOISE pipelines: run_mouse.bash, which processes the mouse feces samples, run_mock.bash which runs the mock community sample, and run_mock1.bash which runs the mock community keeping singleton uniques.

Most of the commands for UPARSE and UNOISE are the same, so I combined them into one script that does both.

Run the pipeline scripts like this:

cd ~/tutorials/misop/scripts
./run_mouse.bash
./run_mock.bash
./run_mock1.bash

This should reproduce the pre-computed files in the misop/out/mouse, misop/out/mock and misop/out/mock1 directories.

For mouse, the most important output files are the OTU tables, which are named outtab.txt (QIIME classic format), otutab.json (BIOM format) and otutab.mothur (mothur "shared" file format) for UPARSE and otutab_den.txt, otutab_den.json and otutab_den.mothur for UNOISE . You can use these to perform further analysis in QIIME or mothur. There isn't much point in making an OTU table for the mock reads because there is only one sample.

Part 2. Mock community analysis