Repository: Freie Universit├Ąt Berlin, Math Department

A simple refined DNA minimizer operator enables 2-fold faster computation

Pan, Chenxu and Reinert, Knut and Valencia, Alfonso (2024) A simple refined DNA minimizer operator enables 2-fold faster computation. Bioinformatics, 40 (2). ISSN 1367-4811

Full text not available from this repository.

Official URL:


Motivation The minimizer concept is a data structure for sequence sketching. The standard canonical minimizer selects a subset of k-mers from the given DNA sequence by comparing the forward and reverse k-mers in a window simultaneously according to a predefined selection scheme. It is widely employed by sequence analysis such as read mapping and assembly. k-mer density, k-mer repetitiveness (e.g. k-mer bias), and computational efficiency are three critical measurements for minimizer selection schemes. However, there exist trade-offs between kinds of minimizer variants. Generic, effective, and efficient are always the requirements for high-performance minimizer algorithms. Results We propose a simple minimizer operator as a refinement of the standard canonical minimizer. It takes only a few operations to compute. However, it can improve the k-mer repetitiveness, especially for the lexicographic order. It applies to other selection schemes of total orders (e.g. random orders). Moreover, it is computationally efficient and the density is close to that of the standard minimizer. The refined minimizer may benefit high-performance applications like binning and read mapping. Availability and implementation The source code of the benchmark in this work is available at the github repository

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:3139
Deposited By: Anja Kasseckert
Deposited On:18 Apr 2024 10:31
Last Modified:18 Apr 2024 11:45

Repository Staff Only: item control page