Home Software Services About Contact
Python

Python scripts home page

uc2otutab.py

This script is obsolete -- use the -otutabout or -biomout option of usearch_global.
See Mapping reads to OTUs for details.


Usage
python uc2otutab.py ucfile > tabfile

Description
Converts a .uc file to an OTU table. The .uc file can be generated by using usearch_global with the reads as a query and the OTU representative sequences as the database.

The OTU table is a tabbed text file which can be easily imported into a spreadsheet or parsed by a script.

Read labels must include barcodelabel=samplename; annotations. It is up to you to add these annotations. See here for an example of how to add sample labels to demultiplexed Illumina reads, i.e. reads that are already split into separate FASTQ files by barcode/sample identifier.

Database sequence labels are assumed to OTU identifiers. The label format doesn't matter. I suggest using something like OTU_1, OTU_2...

OTU table format
The OTU table is formatted as a tabbed text file. Rows are OTUs, columns are samples, values are the numbers of reads assigned to each OTU. Here is a toy example.

The header row is "OTU" followed by all OTU identifiers, separated by tabs.

Following the header row is one row for each sample sample. The first field is the sample identifier, obtained from the read label (see below), then one value for each OTU in the order given by the header row. The value is an integer giving the number of reads assigned to that OTU in this sample.

The OTU identifier is simply the target sequence label from the database. It can be anything you like, but it is recommend to use a convention like OTU_nnn where nnn is an integer number 1, 2 ... identifying the OTU. Sequential labels of this type can be assigned using the fasta_number.py script.

The sample identifier is obtained by parsing the read label looking for a field barcodelabel=Sample_id followed by a semi-colon. This field must be present in the read label. It can be added using the fastq_strip_barcode_relabel.py script or any other convenient method. If the barcodelabel= field is not present in the labels, you will get this error:

   barcodelabel= not found in read label

The most common reason for this error is using Illumina reads that were already demultiplexed, i.e. split into separate FASTQ files according to the barcode, by the Illumina machine software. See demultiplexed Illumina reads for discussion.