Repository: Freie Universität Berlin, Math Department

Development of Bioinformatic Tools for Retroviral Analysis from High Throughput Sequence Data

Löber, Ulrike (2019) Development of Bioinformatic Tools for Retroviral Analysis from High Throughput Sequence Data. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL:


For hundreds of millions of years, retroviruses have been integrating into genomes of vertebrates. This thesis contributes to the development of new methods for retrieval, characterization and the comparison of viruses that have integrated into the genome (endogenous retroviruses, or ERVs) and their integration sites in host genomes. The koala retrovirus is an outstanding study subject since it is currently in the transition from an exogenous to an endogenous retrovirus. In the past decades, high-throughput sequencing (HTS) has allowed scientists to investigate genomic data at high coverage and low costs. However, the development of new sequencing technologies facilitated the production of vast amounts of data. The analysis bottleneck has shifted from data production to the analysis of so-called “big data”. In consequence, new algorithms and pipelines need to be established to process biological data. Solutions for automated handling of short-read HTS data exist for many problems and can be improved and extended. Recent improvements in HTS resulting in longer sequence fragments have helped solve problems connected to short-read sequencing but produced new challenges for genomics data processing. In this thesis, I present pipelines to comprehensively profile endogenous retroviruses from short-read HTS data for museum koala samples (ancient DNA) and describe a new method to amplify retroviral integration sites facilitating long-read HTS. The thesis is divided into five sections. In the first part, I describe the biological problem, the evolution of sequencing technologies, resulting in information technology problems and proposed solutions (chapter 1). In the second chapter, I present a comparison of three different target enrichment techniques to retrieve retroviral integration sites from museum koala samples. The computational pipeline I developed for this purpose is presented. In chapter 3 I describe a method (sonication inverse polymerase chain reaction) for target enrich- ment of long sequence fragments to exploit the capacities of third-generation sequencing technologies. An analysis pipeline for the processing of sonication inverse PCR products was established. Moreover, the remaining problems resulting from artificial read structures are discussed. In chapter 4 the method described in chapter 3 was used to profile koala retrovirus integrations. The striking discovery of a new retroviral recombinant in koalas is reported. Finally, I discuss our findings and compare short- and long-read HTS technologies. An outlook for further applications and remaining computational problems is outlined. Overall, this thesis contributes to the automated computational processing of HTS data from target enrichment techniques to profile endogenous retroviruses in host genomes.

Item Type:Thesis (PhD)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2533
Deposited By: Anja Kasseckert
Deposited On:24 Mar 2021 12:21
Last Modified:24 Mar 2021 12:21

Repository Staff Only: item control page