Computational methods for Quantitative Peptide Mass Spectrometry

Schulz-Trieglaff, Ole (2009) Computational methods for Quantitative Peptide Mass Spectrometry. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL: https://refubium.fu-berlin.de/handle/fub188/5515

Abstract

This thesis presents algorithms for the analysis of liquid chromatography-mass spectrometry (LC-MS) data. Mass spectrometry is a technology that can be used to determine the identities and abundances of the compounds in complex samples. In combination with liquid chromatography, it has become a popular method in the field of proteomics, the large-scale study of proteins and peptides in living systems. This area of research has gained a lot of interest in recent years since proteins control fundamental reactions in the cell. Consequently, a deeper knowledge of their function is expected to be crucial for the development of new drugs and the cure of diseases. The data sets obtained from an LC-MS experiment are large and highly complex. The outcome of such an experiment is called an LC-MS map. The map is a collection of mass spectra. They contain, among the signals of interest, a high amount of noise and other disturbances. That is why algorithms for the low-level processing of LC-MS data are becoming increasingly important. These algorithms are the focus of this text. Our novel contributions are threefold: first, we introduce SweepWavelet, an algorithm for the efficient detection and quantification of peptides from LC-MS data. The quantification of proteins and peptides using mass spectrometry is of high interest for biomedical research but also for the pharmaceutical industry since it is usually among the first steps in an LC-MS data analysis pipeline and all subsequent steps depend on its quality. Our approach was among the first to address this problem in a sound computational framework. It consists of three steps: first, we apply a tailored wavelet function that filters mass spectra for the isotope peaks of peptides. Second, we use a method inspired by the sweep-line paradigm which makes use of the redundant information in LC-MS data to determine mass, charge, retention time and abundance of all peptides. Finally, we apply a flexible peptide signal model to filter the extracted signals for false positives. The second part of this thesis deals with the benchmarking of LC-MS signal detection algorithms. This is a non-trivial task since it is difficult to establish a ground truth using real world samples: which sample compounds become visible in an LC-MS data set is not known in advance. To this end, we use annotated data and simulations to assess the performance of currently available algorithms. To simulate benchmark data, we developed a simulation software called LC-MSsim. It incorporates computational models for retention time prediction, peptide detectability, isotope pattern and elution peaks. Using this software, we can simulate all steps in an LC-MS experiment and obtain a list with the positions, charges and abundances of all peptide signals contained in the resulting LC-MS map. This gives us a ground truth against which we can match the results of a signal detection algorithm. In this thesis, we use it for the benchmarking of quantification algorithms but its scope is wider and it can also be used to evaluate other algorithms. To our knowledge, LC-MSsim is the first software that can simulate the full LC-MS data acquisition process. The third contribution of this thesis is a statistical framework for the quality assessment of quantitative LC-MS experiments. Whereas quality assessment and control are already widespread in the field of gene expression analysis, our work is the first to address this problem for LCMS data. We use methods from robust statistics to detect outlier LC-MS maps in large-scale quantitative experiments. Our approach introduces the notion of quality descriptors to derive an abstract representation of an LC-MS map and applies a robust principal component analysis based on projection pursuit. We show that it is sensible to use robust statistics for this problem and evaluate our method on simulated maps and on data from three real-world LC-MS studies.

Item Type:	Thesis (PhD)
Subjects:	Mathematical and Computer Sciences > Computer Science
Divisions:	Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:	2534
Deposited By:	Anja Kasseckert
Deposited On:	24 Mar 2021 12:24
Last Modified:	24 Mar 2021 12:24

Repository Staff Only: item control page