Andreotti, Sandro (2015) Linear Programming and Integer Linear Programming in Bioinformatics. PhD thesis, Freie Universität Berlin.
Full text not available from this repository.
Official URL: https://refubium.fu-berlin.de/handle/fub188/13001
Abstract
A wide range of important problems related to bioinformatics and computational biology are optimization problems asking for a solution that minimizes or maximizes a certain objective function. Often, these problems are combinatorial optimization problems that can be formulated as integer linear programs. While for some of these problems polynomial time algorithms are known, for many other problems it is unlikely that such algorithms exist. However, much work has been dedicated to develop algorithms that are capable of solving many interesting integer linear programming problems on real live instances with acceptable memory and running time requirements. These algorithms are implemented in a variety of free or commercial solver software packages. In situations where the performance of general purpose solvers is insufficient, often problem specific integer linear programming techniques can be applied that take advantage of knowledge about the particular structure of the integer linear programming formulation to solve the problem in a much more time- or space-efficient way. In this thesis we present our algorithmic approaches to three relevant bioinformatic problems, each involving certain linear programming and integer linear programming techniques. The first problem is the de novo peptide sequencing problem, which consists in identifying a peptide's sequence solely from its tandem mass spectrum without any additional information stored in genome databases or protein databases. This problem can be formulated as a graph theoretical problem asking for the computation of a longest antisymmetric path in a directed acyclic graph. The particular structure of the associated integer linear programming formulation facilitates the application of a technique called Lagrangian relaxation, which yields an algorithm that outperforms state-of-the-art commercial integer linear programming solvers by orders of magnitude. The second problem is the isoform inference and abundance estimation problem from RNA-Seq data. This problem consists in predicting a set of expressed RNA isoforms, i.e., full length RNA transcripts corresponding to alternative splice variants, together with an estimate of their individual expression levels. We apply a linear programming technique called delayed column generation, which allows us to increase the search space without explicitly enumerating the potentially huge set of candidate isoforms. As a consequence, our approach allows for the identification of isoforms that otherwise could not be recovered due to incomplete read coverage. A central component of our delayed column generation algorithm is an integer linear programming formulation. The third problem is the duplication-loss alignment problem, which asks for a labeled alignment of two genome sequences that implies the minimal number of loss and duplication events in the evolutionary history from an unknown nearest common ancestor. In a labeled alignment, every unaligned gene must be labeled either as a loss or as the product of a duplication event. Once an optimal labeled alignment has been computed, a common ancestor genome with minimal implied evolutionary operations can be derived in a straight forward way. In our approach we identified problem specific cutting planes and developed efficient separation algorithms to obtain a branch and cut algorithm that is several orders of magnitude faster than existing approaches based on integer linear programming.
Item Type: | Thesis (PhD) |
---|---|
Subjects: | Mathematical and Computer Sciences > Computer Science |
Divisions: | Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group |
ID Code: | 2531 |
Deposited By: | Anja Kasseckert |
Deposited On: | 24 Mar 2021 12:15 |
Last Modified: | 24 Mar 2021 12:15 |
Repository Staff Only: item control page