Repository: Freie Universität Berlin, Math Department

EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices

Pockrandt, Christopher and Ehrhardt, Marcel and Reinert, Knut (2017) EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices. In: Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science (LNCS), 10229 . Springer, Cham, pp. 190-206. ISBN 978-3-319-56970-3 (elektronisch)

Full text not available from this repository.

Official URL: http://doi.org/10.1007/978-3-319-56970-3_12

Abstract

The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If σ is the size of the alphabet then the method of Lam et al. can conduct one step in time O(σ) while needing space O(σ⋅n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(logσ) while using O(logσ⋅n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees. In this paper we introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in O(1) time per step while using O(logσ⋅n)+o(logσ⋅σ⋅n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between ≈2.2−4.2 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

Item Type:Book Section
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2118
Deposited By: Anja Kasseckert
Deposited On:12 Oct 2017 12:37
Last Modified:12 Oct 2017 12:37

Repository Staff Only: item control page