The NCBI Short Read Archive (SRA) stores paired reads in at least two different formats: interleaved and concatenated. These formats are not documented, to the best of my knowledge. You have to figure out which format you have by inspection. The fastx_info command is useful for a quick check. E.g., if the median sequence length is 600, then you probably have 2x300 reads in concatenated format.
The -mode option species the format. Valid values are interleaved and concatenated.
FASTQ output files for the R1 (forward) and R2 (reverse) reads are specified by the -output1 and -output2 options.
Interleaved format
With interleaved format, FASTQ
records are R1, R2, R1, R2 etc. This is supported by the -interleaved option
of fastq_mergepairs, so if you want
to merge the pairs you may not need to run fastq_sra_splitpairs first.
Concatenated format
With concatenated format, the R1 and
R2 sequences are combined into a single sequence with R2 immediately following R1 (as
opposed to merged or assembled). There is no spacer or filler sequence
separating the reads; they are simply concatenated. Sometimes, the reads are truncated by
quality filtering before they are concatenated, in which case they are
pretty much useless because it is impossible to recover the original R1s and
R2s. If some or all of the reads are full-length, then the R1s and R2s can
be recovered by splitting the sequence (and quality scores) at the half-way
point. With concatenated format, the read length must be specified
by the -readlength option, e.g. 250. If the concatenated sequence is 2x the
read length (e.g. 500) then it is split at the midpoint, otherwise it is
discarded because it is impossible to determine where the R1 ends and the R2
begins.
Example
usearch -fastq_sra_splitpairs SRA457665.fastq -mode concatenated \
-readlength
250 -output1 fwd.fq -output2
rev.fq