I'll assume your downloaded files are in
~/Downloads, if you downloaded to a different path then replace as needed
below.
Make a top-level directory for the tutorials, change to that
directory and extract the data files using tar for the tutorial files. See
tutorial directories for description of
subdirectories.
Extract the data files from the archives: mkdir -p ~/tutorials cd ~/tutorials tar -zxvf ~/Downloads/misop.tar.gz
unzip MiSeqSOPData.zip
Make the
misop/fq directory and move the FASTQ files into it.:
cd ~/tutorials mkdir misop/fq mv MiSeqSOP/*.fastq
misop/fq
Create the
utax database by running the setup_utax.bash script, like this.
cd ~/tutorials/misop/scripts ./setup_utax.bash Notice the
dot and slash (./) before
setup_utax.bash. This tells the shell to look for the command file (script
or binary) in your current directory (dot means current directory). This is
needed if the current directory is not in your PATH. Tutorial scripts always
assume that they are being run like this, i.e. from inside the scripts/
subdirectory.
The setup_utax.bash script uses curl to fetch the data.
Some systems don't have curl in which case you can use wget. There is a wget
command in the script which is commented out so it's a simple edit of the
script to comment out curl instead.
There are three scripts which run
UPARSE pipelines: run_mouse.bash, which processes the mouse feces samples,
run_mock.bash which runs the mock community sample, and run_mock1.bash which
runs the mock community keeping singleton uniques. Run them like this:
cd ~/tutorials/misop/scripts ./run_mouse.bash
./run_mock.bash ./run_mock1.bash
This should reproduce the
pre-computed files in the misop/out/mouse, misop/out/mock and
misop/out/mock1 directories.
For mouse, the most important output files are the
OTU tables, which are named outtab.txt
(QIIME classic format), otutab.json (BIOM
format) and otutab.mothur (mothur
"shared" file format) . You can use these to perform further analysis in QIIME or
mothur. There isn't much point in making an OTU table for the mock reads
because there is only one sample.