Repository: Freie Universität Berlin, Math Department

A combinatorial approach to RNA sequence-structure alignments

Bauer, Markus Johann (2008) A combinatorial approach to RNA sequence-structure alignments. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL: https://refubium.fu-berlin.de/handle/fub188/4820

Abstract

Until a couple of years ago the scientific mainstream held that genetic information, stored as DNA strands, is transcribed to RNA, and RNA sequences are in turn translated to proteins, the actual functional units in the cell. RNA was generally believed to be a helper molecule in the cell until the beginning of the new millennium. This view changed. We see the potential of RNA as one of the key cellular players. In this thesis we present a novel framework for computing sequence-structure alignments of RNA sequences. Our contribution is twofold: first, we give a graph-theoretic model for the computation of multiple sequence-structure alignments. We phrase the model as an integer linear program (ILP) and show how we can relax the ILP such that we are able to compute optimal or near-optimal solutions for the original problem. In a subsequent step, we augment the initial model with stacking energies. Stacking base pairs greatly contribute to the energetic stability of the overall structure and should therefore be additionally rewarded. We extend the original ILP such that stacking energies are incorporated. Second, we give extensive computational results on real data from the RFAM database. We compare the performance of truly multiple sum-of-pairs sequence-structure alignments to heuristic sequence-structure alignments. We show that the objective function value of the sum-of-pairs model is generally higher compared to the heuristically inferred alignments. At the same time, we sketch the computational limits for the sum-of-pairs multiple sequence-structure model. The computational costs for computing exact multiple sequence-structure alignments are generally very high. To validate our approach on a larger test set, we run two implementations that take two sequences as their input. LaRA and SLaRA---based on the initial and the stack model---compute all pairwise sequence-structure alignments and use the external program TCOFFEE to infer a consistency-based multiple sequence-structure alignment. Additionally, we run the progressive versions PLaRA and PSLaRA on the same input data set. Our experiments on the BRAliBase benchmark set show that our tools are top-ranked for all input classes. Furthermore, our implementations need less running time compared to similar approaches. Subsequently, we compare two different algorithms for computing the optimal value of the Lagrangian dual and show that in our test setting the conceptually easier subgradient method is superior to the bundle method. Finally, we incorporate our Lagrangian relaxation approach into a branch-and-bound framework. We show for which instances we are able to compute provably optimal solutions and compare our results with previously published results of a branch-and-bound approach for the related quadratic knapsack problem.

Item Type:Thesis (PhD)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2545
Deposited By: Anja Kasseckert
Deposited On:24 Mar 2021 13:12
Last Modified:24 Mar 2021 13:12

Repository Staff Only: item control page