UPARSE and UNOISE pipelines, from reads to OTU table
This tutorial uses data from study
PRJEB7970
deposited by CEH at the European
Nucleotide Archive derived from samples of Scots Pine (Pinus sylvestris)
needles collected from forests and plantations in Scotland. Libraries were
constructed using fITS7 (forward) and ITS4 (reverse) primers described in
Ihrmark et
al. (2012) targeting the 5 8S and LSU rRNA genes flanking the ITS2
region. Sequencing was done using MiSeq 2x300 PE. To reduce the dataset size
for this tutorial, 5,000 reads were taken at random from the first ten pairs
of FASTQ files.
I'll assume your downloaded files are in
~/Downloads, if you downloaded to a different path then replace as needed
below.
Make a top-level directory for the tutorials, change to that
directory and extract the data files using tar for the tutorial files. See
tutorial directories for description of
subdirectories.
Extract the data files: mkdir -p ~/tutorials cd ~/tutorials tar -zxvf ~/Downloads/misits.tar.gz
Create the
utax
database by running the setup_utax.bash script, like this.
cd ~/tutorials/misits/scripts ./setup_utax.bash Notice the
dot and slash (./) before
setup_utax.bash. This tells the shell to look for the command file (script
or binary) in your current directory (dot means current directory). This is
needed if the current directory is not in your PATH. Tutorial scripts always
assume that they are being run like this, i.e. from inside the scripts/
subdirectory.
The setup_utax.bash script uses curl to fetch the data.
Some systems don't have curl in which case you can use wget. There is a wget
command in the script which is commented out so it's a simple edit of the
script to comment out curl instead.
The run.bash script
executes the UPARSE and UNOISE pipelines. Most of the commands needed are
the same for UPARSE and UNOISE, so I combined them both into a single
script. Run it like this:
cd ~/tutorials/misits/scripts ./run.bash
This should reproduce the
pre-computed files in the misits/out directory.
The most important output files are the
OTU tables, which are named outtab.txt
(QIIME classic format), otutab.json (BIOM
format) and otutab.mothur (mothur
"shared" file format) for UPARSE and otutab_den.txt, otutab_den.json and
otutab_den.mothur for UNOISE. You can use these to perform further analysis in
QIIME or mothur.