Repository: Freie Universität Berlin, Math Department

ganon2: up-to-date and scalable metagenomics analysis

Piro, Vitor C. and Reinert, Knut (2023) ganon2: up-to-date and scalable metagenomics analysis. bioRxiv preprint .

Full text not available from this repository.

Official URL: https://doi.org/10.1101/2023.12.07.570547

Abstract

The fast growth of public repositories of sequences greatly contributes to the success of metagenomics applications. However, they are growing at a much faster pace than the resources to use them properly. This challenges current methods, which struggle to take full advantage of the massive and fast data generation. We propose a generational leap in performance and usability with ganon2, a sequence classification method that performs taxonomic binning and profiling for metagenomics analysis. It indexes large datasets with a small memory footprint, maintaining fast, sensitive, and precise classification results. This is possible with the Hierarchical Interleaved Bloom Filter data structure paired with minimizers and several other improvements and optimizations. Based on the full NCBI RefSeq and its sub-sets, ganon2 indices are on average 50% smaller than state-of-the-art methods, providing a great compression rate for large and diverse genomic reference sets. Using 16 simulated samples from various studies, including the CAMI 1+2 challenge, ganon2 achieved up to 0.17 higher median F1-Score in taxonomic binning. In profiling, improvements in the F1-Score median are up to 0.32 keeping a balanced L1-norm error in the abundance estimation. ganon2 is one of the fastest tools evaluated and enables the use of larger, more diverse and up-to-date reference sets in daily microbiome analysis, improving the resolution of results. The code is open-source and available with documentation at https://github.com/pirovc/ganon

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:3142
Deposited By: Anja Kasseckert
Deposited On:18 Apr 2024 10:54
Last Modified:18 Apr 2024 10:54

Repository Staff Only: item control page