Repository: Freie Universität Berlin, Math Department

Reference and taxonomy based methods for classification and abundance estimation of organisms in metagenomic samples

Piro, Vitor C (2018) Reference and taxonomy based methods for classification and abundance estimation of organisms in metagenomic samples. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL:


Metagenomics provides the means to study the vast and still mostly unknown microbial world which comprises at least half of earth's genetic diversity. Computational metagenomics enables those discoveries via analysis of large amounts of data which are being generated in a fast pace with high-throughput technologies. Reference-based methods are commonly used to study environmental samples based on a set of previously assembled reference sequences which are often linked to a taxonomic classification. Finding the origin of each sequenced fragment and profiling an environmental sample as a whole are the main goals of binning and taxonomic profiling tools, respectively. In this thesis I present three methods in computational metagenomics. Sets of curated reference sequences jointly with taxonomic classification are employed to characterize community samples. The main goal of those contributions is to improve the state-of-the-art of taxonomic profiling and binning, with fast, sensitive and precise methods. First I present ganon, a sequence classification tool for metagenomics which works with a very large number of reference sequences. Ganon provides an efficient method to index sequences and to keep those indices updated in very short time. In addition, ganon performs taxonomic binning with strongly improved precision compared to the current available methods. For a general profiling of metagenomic samples and abundance estimation I introduce DUDes. Rather than predicting strains in the sample based only on relative abundances, DUDes first identifies possible candidates by comparing the strength of mapped reads in each node of the taxonomic tree in an iterative top-down manner. This technique works in an opposite direction of the lowest common ancestor approach. Lastly, I present MetaMeta, a pipeline to execute metagenome analysis tools and integrate their results. MetaMeta is a method to combine and enhance results from multiple taxonomic binning and profiling tools and at the same time a pipeline to easily execute tools and analyze environmental data. MetaMeta includes database generation, pre-processing, execution, and integration steps, allowing easy installation, visualization and parallelization of state-of-the-art tools. Using the same input data, MetaMeta provides more sensitive and reliable results with the presence of each identified organism being supported by several methods. Those three projects introduce new methodologies and improved results over similar methods, constituting valuable contributions to characterize communities in a reference and taxonomy-based manner.

Item Type:Thesis (PhD)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2538
Deposited By: Anja Kasseckert
Deposited On:24 Mar 2021 12:47
Last Modified:24 Mar 2021 12:47

Repository Staff Only: item control page