Repository: Freie Universität Berlin, Math Department

Scalable string similarity search/join with approximate seeds and multiple backtracking

Siragusa, E. and Weese, D. and Reinert, K. (2013) Scalable string similarity search/join with approximate seeds and multiple backtracking. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, Genoa, Italy.

Full text not available from this repository.

Official URL: http://doi.acm.org/10.1145/2457317.2457386

Abstract

We present in this paper scalable algorithms for optimal string similarity search and join. Our methods are variations of those applied in Masai [15], our recently published tool for mapping high-throughput DNA sequencing data with unpreceded speed and accuracy. The key features of our approach are filtration with approximate seeds and methods for multiple backtracking. Approximate seeds, compared to exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds. Combined together, these two methods significantly speed up string similarity search and join operations. Our tool is implemented in C++ and OpenMP using the SeqAn library. The source code is distributed under the BSD license and can be freely downloaded from http://www.seqan.de/projects/edbt2013.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:approximate seeds, backtracking, banded Myers bit-vector, banded dynamic programming, filtration, radix tree, suffix tree
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:1225
Deposited By: Anja Kasseckert
Deposited On:28 Mar 2013 12:24
Last Modified:10 Aug 2013 11:41

Repository Staff Only: item control page