Home Software Services About Contact
Python

Python scripts home page

fastq_strip_barcode_relabel.py

Usage
python fastq_strip_barcode_relabel.py reads primer barcodes label_prefix > outputfile

python fastq_strip_barcode_relabel2.py reads primer barcodes label_prefix > outputfile

Description
Strips the primer and barcode and creates a new label for the read containing the barcode sequence (fastq_strip_barcode_relabel.py) or barcode label (fastq_strip_barcode_relabel2.py).

Generally used for 454 reads.

Assumes the read layout is <barcode><primer><gene>.

If you reads start with a control sequence (typically TCAG) then this can be added to the barcodes.

The reads argument is a FASTQ file containing the reads.

The primer argument is the primer sequence. Wildcards such as N are allowed in the primer sequence. Up to 2 primer mismatches are allowed.

The barcodes argument is the name of a FASTA file containing barcodes. No mismatches are allowed with the barcode.

The label_prefix agument is a string of characters. The read label is replaced by:

label_prefixN;barcodelabel=xxx;

where N is the read number, counting from 1 as the first read in the file, and  xxx is the barcode sequence (fastq_strip_barcode_relabel.py) or FASTA label (fastq_strip_barcode_relabel2.py).

Output is written in FASTQ format.