Repository: Freie Universität Berlin, Math Department

Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries

Mehringer, Svenja and Seiler, Enrico and Droop, Felix and Darvish, Mitra and Rahn, René and Vingron, Martin and Reinert, Knut (2023) Hierarchical Interleaved Bloom Filter: enabling ultrafast, approximate sequence queries. Genome Biology, 24 (131). ISSN 1474-760X

Full text not available from this repository.

Official URL: https://doi.org/10.1186/s13059-023-02971-4

Abstract

We present a novel data structure for searching sequences in large databases: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it could serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size, and search time while achieving a comparable or better accuracy compared to other state-of-the-art tools. The HIBF builds an index up to 211 times faster, using up to 14 times less space, and can answer approximate membership queries faster by a factor of up to 129. We show that the HIBF is superior in build time, index size and search time while achieving a comparable or better accuracy compared to other state-of-the art tools (Mantis and Bifrost). The HIBF builds an index up to 211 times faster, using up to 14 times less space and can answer approximate membership queries faster by a factor of up to 129. This can be considered a quantum leap that opens the door to indexing complete sequence archives like the European Nucleotide Archive or even larger metagenomics data sets.

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2846
Deposited By: Anja Kasseckert
Deposited On:05 Sep 2022 10:58
Last Modified:10 Jul 2023 13:06

Repository Staff Only: item control page