Repository: Freie Universität Berlin, Math Department

Whole Genome Shotgun Sequencing Based Taxonomic Profiling Methods for Comparative Study of Microbial Communities

Dadi, Temesgen Hailemariam (2019) Whole Genome Shotgun Sequencing Based Taxonomic Profiling Methods for Comparative Study of Microbial Communities. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL: https://refubium.fu-berlin.de/handle/fub188/24522

Abstract

Microorganisms, typically occurring as large, species diverse communities, are a ubiquitous part of nature. These communities are a vital part of their environment, influencing it through various layers of interaction. Host-associated microbial communities are particularly scrutinized for their influence on the host’s health. Additionally, there is a growing interest in microbial communities due to their role in livestock, agriculture, waste treatment, mining, and biotechnology. Metagenomics is a relatively young scientific field that aims to study such microbial communities based on genetic material recovered directly from an environment. Advances in DNA sequencing have enabled us to perform taxonomic profiling, i.e. to identify microbial species quantitatively and qualitatively at increasing depth. In whole genome shotgun sequencing (WGS), environmental DNA is taken directly from an environment and sequenced after being fragmented, without PCR amplification. Taxonomic profiling methods based on such sequencing data introduce less PCR bias compared to their amplicon based counterparts such as 16S-rDNA based profiling methods. However, the challenges posed by the enormous and redundancy of databases and the high degree homology among reference genomes of microorganisms put WGS methods at a disadvantage. In this thesis, we will present and discuss two separate computational methods that address both challenges. The first method is a taxonomic profiler that leverages coverage landscapes created by mapping sequencing reads across reference genomes to address the challenge posed by homologous regions of genomes. By carefully evaluating the coverage profile of reference genomes we drop spurious references from consideration. This filtration strategy results in more uniquely mapping reads to the remaining reference genomes improving both the resolution and accuracy of the taxonomic profiling process. We have also shown that this method improves the quality of relative abundances assigned to each detected member organism. The second method is a distributed read mapper which addresses the issue of large and frequently changing databases by systematically partitioning it into smaller bins. It significantly reduces the time, and computational resources required to build indices from such large databases by orders of magnitudes and updates can be performed very quickly in a few minutes compared to days in earlier methods. To achieve a competitive mapping speed while maintaining many small indices, we implemented a novel, fast and lightweight filtering data structure called interleaved bloom filter. With that, we are able to achieve the described improvements in the index building and updating time without compromising the read-mapping speed.

Item Type:Thesis (PhD)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2524
Deposited By: Anja Kasseckert
Deposited On:24 Mar 2021 11:32
Last Modified:24 Mar 2021 11:32

Repository Staff Only: item control page