Repository: Freie Universität Berlin, Math Department

TetRex: a novel algorithm for index-accelerated search of highly conserved motifs

Schwab, Remy Marcel and Gottlieb, Simon G and Reinert, Knut (2025) TetRex: a novel algorithm for index-accelerated search of highly conserved motifs. NAR Genomics and Bioinformatics, 7 (2).

Full text not available from this repository.

Official URL: https://academic.oup.com/nargab/article/7/2/lqaf03...

Abstract

The scale of modern datasets has necessitated innovations to solve even the most traditional and fundamental of computational problems. Set membership and set cardinality are both examples of simple queries that, for large enough datasets, quickly become challenging using traditional approaches. Interestingly, we find a need for these innovations within the field of biology. Despite the vast differences among living organisms, there exist functions so critical to life that they are conserved in the genomes and proteomes across seemingly unrelated species. Regular expressions (regexes) can serve as a convenient way to represent these conserved sequences compactly. However, despite the strong theoretical foundation and maturity of tools available, the state-of-the-art regex search falls short of what is necessary to quickly scan an enormous database of biological sequences. In this work, we describe a novel algorithm for regex search that reduces the search space by leveraging a recently developed compact data structure for set membership, the hierarchical interleaved bloom filter. We show that the runtime of our method combined with a traditional search outperforms state-of-the-art tools.

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:3284
Deposited By: Anja Kasseckert
Deposited On:15 Sep 2025 15:39
Last Modified:15 Sep 2025 15:39

Repository Staff Only: item control page