Pockrandt, Christopher and Ehrhardt, Marcel and Reinert, Knut
(2017)
*EPR-Dictionaries: A Practical and Fast Data Structure for Constant Time Searches in Unidirectional and Bidirectional FM Indices.*
In:
Research in Computational Molecular Biology. RECOMB 2017.
Lecture Notes in Computer Science (LNCS), 10229
.
Springer, Cham, pp. 190-206.
ISBN 978-3-319-56970-3 (elektronisch)

Full text not available from this repository.

Official URL: http://doi.org/10.1007/978-3-319-56970-3_12

## Abstract

The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. If σ is the size of the alphabet then the method of Lam et al. can conduct one step in time O(σ) while needing space O(σ⋅n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(logσ) while using O(logσ⋅n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees. In this paper we introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in O(1) time per step while using O(logσ⋅n)+o(logσ⋅σ⋅n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between ≈2.2−4.2 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.

Item Type: | Book Section |
---|---|

Subjects: | Mathematical and Computer Sciences > Computer Science |

Divisions: | Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group |

ID Code: | 2118 |

Deposited By: | Anja Kasseckert |

Deposited On: | 12 Oct 2017 12:37 |

Last Modified: | 12 Oct 2017 12:37 |

Repository Staff Only: item control page