mothur script for creating HMP mock community OTUs

Titanium reads were analyzed for mothur following the recommended procedure [1] at http://www.mothur.org/wiki/Schloss_SOP (downloaded Oct. 24th, 2012).

Commands were run with mothur v1.27.0, 64-bit under Linux. Quite different results were obtained using the Windows version of mothur; these are apparently due to bugs so the Linux results are quoted. The flowgram file reads.sff was obtained by the sff-dump utility from the SRA download files. The oligos.txt file contained one line specifying the V5 primer:

forward CCGTCAATTCMTTTRAGT v35

The script was as follows:

sffinfo(sff=reads.sff, flow=T)
trim.flows(flow=reads.flow, oligos=oligos.txt, pdiffs=2, bdiffs=1,processors=4)
shhh.flows(file=reads.flow.files, processors=4)
trim.seqs(fasta=reads.v35.shhh.fasta, name=reads.v35.shhh.names, oligos=oligos.txt, pdiffs=2, bdiffs=1, maxhomop=8, minlength=200, flip=T, processors=4)
unique.seqs(fasta=reads.v35.shhh.trim.fasta, name=reads.v35.shhh.trim.names)
align.seqs(fasta=reads.v35.shhh.trim.unique.fasta, reference=silva.bacteria.fasta, processors=4)
screen.seqs(fasta=reads.v35.shhh.trim.unique.align, name=reads.v35.shhh.trim.unique.names, group=reads.v35.shhh.groups, end=27659, optimize=start, criteria=95, processors=4)
filter.seqs(fasta=reads.v35.shhh.trim.unique.good.align, vertical=T, trump=., processors=4)
unique.seqs(fasta=reads.v35.shhh.trim.unique.good.filter.fasta, name=reads.v35.shhh.trim.unique.good.names)
pre.cluster(fasta=reads.v35.shhh.trim.unique.good.filter.unique.fasta, name=reads.v35.shhh.trim.unique.good.filter.names, group=reads.v35.shhh.good.groups, diffs=2)
chimera.uchime(fasta=reads.v35.shhh.trim.unique.good.filter.unique.precluster.fasta, name=reads.v35.shhh.trim.unique.good.filter.unique.precluster.names, group=reads.v35.shhh.good.groups, processors=4)
remove.seqs(accnos=reads.v35.shhh.trim.unique.good.filter.unique.precluster.uchime.accnos, fasta=reads.v35.shhh.trim.unique.good.filter.unique.precluster.fasta, name=reads.v35.shhh.trim.unique.good.filter.unique.precluster.names, group=reads.v35.shhh.good.groups)
system(cp reads.v35.shhh.trim.unique.good.filter.unique.precluster.pick.names final.names)
system(cp reads.v35.shhh.trim.unique.good.filter.unique.precluster.pick.fasta final.fasta)
dist.seqs(fasta=final.fasta, cutoff=0.15, processors=4)
cluster(column=final.dist, name=final.names)
get.oturep(column=final.dist, name=final.names, fasta=final.fasta)
quit()

Reference
1. Schloss, P. D., Gevers, D. & Westcott, S. L. Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PloS One 6, e27310 (2011) Link to paper.