Mehringer, Svenja and Seiler, Enrico and Droop, Felix and Darvish, Mitra and Rahn, René and Vingron, Martin and Reinert, Knut (2022) Hierarchical Interleaved Bloom Filter: Enabling ultrafast, approximate sequence queries. bioRxiv .
Full text not available from this repository.
Official URL: https://doi.org/10.1101/2022.08.01.502266
Abstract
Searching sequences in large, distributed databases is the most widely used bioinformatics analysis done. This basic task is in dire need for solutions that deal with the exponential growth of sequence repositories and perform approximate queries very fast. In this paper, we present a novel data structure: the Hierarchical Interleaved Bloom Filter (HIBF). It is extremely fast and space efficient, yet so general that it has the potential to serve as the underlying engine for many applications. We show that the HIBF is superior in build time, index size and search time while achieving a comparable or better accuracy compared to other state-of-the art tools (Mantis and Bifrost). The HIBF builds an index up to 211 times faster, using up to 14 times less space and can answer approximate membership queries faster by a factor of up to 129. This can be considered a quantum leap that opens the door to indexing complete sequence archives like the European Nucleotide Archive or even larger metagenomics data sets.
Item Type: | Article |
---|---|
Subjects: | Mathematical and Computer Sciences > Computer Science |
Divisions: | Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group |
ID Code: | 2846 |
Deposited By: | Anja Kasseckert |
Deposited On: | 05 Sep 2022 10:58 |
Last Modified: | 05 Sep 2022 10:58 |
Repository Staff Only: item control page