Chromosomes and other long sequences USEARCH is not designed for long database or query sequences. The maximum sequence length in USEARCH is 50k. Longer sequences can be handled by breaking them up into overlapping segments. I could add a feature to USEARCH to do this internally, which would simplify the task for the user by handling the bookkeeping details (re-mapping coordinates and joining alignments that span multiple segments). However, other tools such as LASTZ and BLAT already provide good speed/sensitivity alternatives to BLASTN for this type of task. Let
me know if you think a USEARCH implementation for long sequences would be useful. |