Home Software Services About Contact     
 
USEARCH v11

Misop tutorial: mouse gut and mock community


See also
 
OTU / denoising tutorial home
 
This tutorial uses data from the mothur MiSeq SOP (citation Kozich et al. 2013).

Download tutorial data, scripts and precomputed results:
   /downloads/misop_v10a.tar.gz

I'll assume your downloaded files are in ~/Downloads, if you downloaded to a different path then replace as needed below.

Make a top-level directory for the tutorials (see tutorial directories for description of subdirectories) and extract the data files from the archives.

mkdir -p ~/tutorials
cd ~/tutorials
tar -zxvf ~/Downloads/misop_v10a.tar.gz

Set the $usearch environment variable to the path name of your usearch binary file.

The run.bash script runs a basic OTU and denoising analysis of the full dataset including mouse gut and mock samples. The mouse gut samples have names like F3D141, the mock sample is called Mock. The run_mock.bash script runs just the mock sample. Run the scripts from the scripts/ directory (commands below). Notice the dot and slash (./) before the script name. This tells the shell to look for the script file in your current directory (dot means current directory). Note that tutorial scripts always assume that scripts/ is your current directory.

cd ~/tutorials/misop/scripts
./run.bash
./run_mock.bash

Running these scripts should regenerate the pre-computed files in the misop/out and misop/out_mock directories.
 
Exercises

See the misop/exercises subdirectory for solutions.

1. Write a script that runs the sintax command to predict taxonomy for the OTU sequences. Use the reference database in the sintax/ sub-directory of the tutorial. Look at the results for the top three OTUs (Otu1, Otu2 and Otu3). What is the phylum for these OTUs? What are their boostrap confidence values for genus? Optional: Run these three OTU sequences at the NCBI BLAST web site. If 95% identity is roughly the threshold for genus, can you assign a genus to these sequences from the BLAST hits? (See: How to BLAST a 16S sequence).

2. Write a script which runs the alpha_div command to calculate the richness and Shannon metrics. Use bits (base 2) for Shannon. What is the richness and Shannon diversity of the mock sample?

3. Write a script which runs the otutab_norm command to normalize all samples to 5,000 reads, then calculate the richness and Shannon diversity on the normalized OTU table. Compare the richness and Shannon diversity of the mock sample to the values obtained in Exercise 2. Did they change? Explain why the values did or did not change.
 
4. Look at the otus.uparseref file in the misop/out_mock directory. Which OTU does not match a reference sequence for the mock community? (Hint: see uparseout option for description of the file format). Make a FASTA file with this sequence and use the usearch_global command to find the same sequence in the OTU file for all samples combined (out/otus.fa). Which is the corresponding OTU? What reason(s) could explain why this unexpected sequence is found in reads for the mock sample?
 
5. Run the OTU sequence from Exercise 4 at NCBI BLAST  (See: How to BLAST a 16S sequence). Which species (singular or plural) have BLAST hits with 100% identity? (You can find the OTU sequence in exercises/otu20.fa if needed).

6. Write a script to run the sintax_summary command for phylum rank. Which is the most common phylum in the OTUs?