Repository: Freie Universität Berlin, Math Department

SeArcH schemes for Approximate stRing mAtching

Gottlieb, Simon G and Reinert, Knut (2025) SeArcH schemes for Approximate stRing mAtching. NAR Genomics and Bioinformatics, 7 (1).

Full text not available from this repository.

Official URL: https://academic.oup.com/nargab/article/7/1/lqaf02...

Abstract

Finding approximate occurrences of a query in a text using a full-text index is a central problem in stringology with many applications, especially in bioinformatics. The recent work has shown significant speed-ups by combining bidirectional indices and employing variations of search schemes. Search schemes partition a query and describe how to search the resulting parts with a given error bound. The performance of search schemes can be approximated by the node count, which represents an upper bound of the number of search steps. Finding optimum search schemes is a difficult combinatorial optimization problem that becomes hard to solve for four and more errors. This paper improves on a few topics important to search scheme based searches: First, we show how search schemes can be used to model previously published approximate search strategies such as suffix filters, 01*0-seeds, or the pigeonhole principle. This unifies these strategies in the search scheme framework, makes them easily comparable and results in novel search schemes that allow for any number of errors. Second, we present a search scheme construction heuristic, which is on par with optimum search schemes and has a better node count than any known search scheme for equal or above four errors. Finally, using the different search schemes, we show that the node count measure is not an ideal performance metric and therefore propose an improved performance metric called the weighted node count, which approximates a search algorithm’s run time much more accurately.

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:3285
Deposited By: Anja Kasseckert
Deposited On:15 Sep 2025 12:55
Last Modified:15 Sep 2025 12:57

Repository Staff Only: item control page