Toy datasets and runme.bash scripts

 
Reproduce the issue
If you have an issue with usearch that you want me to look at, please make a toy dataset that reliably reproduces the issue and a runme.bash script in a standalone directory that executes the usearch command(s) starting from clean data files (usually, FASTA).

Toy dataset
The toy dataset should be as small and simple as possible. This makes the problem easier to analyze and is more convenient, e.g. should ideally be small enough to send as an email attachment.

Eliminate unnecessary commands
The example should reduced to a single usearch command. Isolate the command which causes the problem by testing the commands one at a time.

Reduce data size
Reduce the data size as much as possible. This is helpful even if your dataset is small to start with. There are many ways to do this, e.g.:

- the fastx_subsample command can be used to select a small subset from a large file.

-  Binary search: divide the input data into two and run usearch on each half separately. Repeat until you no longer see the problem.

-  Cluster the input at some low identity, say 80%, then try running usearch the sequences in one of the clusters.

Create a standalone directory
Configure your toy dataset so that all the input files are in a single directory. The goal of this step is to enable a runme script that does not use path names so can be installed anywhere in a file system.

Make a runme script
Make a script named runme.bash for the bash shell with the usearch command(s) that reproduce your issue. Ideally, the runme script should be a single usearch command line. If this is not possible, then the number of calls to usearch should be reduced as far as possible and other commands (e.g., standard Linux commands) should also be reduced as much as possible. There should be no paths in filenames used in the script.

Package the standalone directory
Make a standalone directory that contains: (1) the input files for usearch, typically this will be one or a few FASTA files, (2) the runme.bash script, and (3) the output files you get, so that I can verify I get the same results. Create a gzip compressed tar file using a command like this:

    tar -zcvf /tmp/toy.tz .

Note the period (.) at the end of the command-line above, and note also that the toy.tz file should be created in /tmp or some other directory, not in the stand-alone directory.

Verify the toy works correctly
Create a new directory, extract toy.tz into that directory, and verify that running ./runme.bash reproduces your issue.

Send the toy to me
If the script is small enough, send as an email attachment. Otherwise, you can use a free file sharing service. Occasionally, email systems filter messages with attachments that contain executable files such as scripts, even if they are embedded in tarball files like toy.tz. As a precaution, it is a good idea to send a second email to confirm that the attachment has been send.

Other information
Please also send the following information:

  • Usearch version information. This should include the "three-dot" version number e.g. 5.2.13, the "bitness" (32 or 64) and the binary format (win32, win64, linux or osx).
  • Platform. Send the output from uname -a.