Repository: Freie Universität Berlin, Math Department

# Algorithms for the Analysis of Biomolecular Simulations: Ensemble Averages, Marginal Distributions, Clustering and Markov Models

Keller, B. (2009) Algorithms for the Analysis of Biomolecular Simulations: Ensemble Averages, Marginal Distributions, Clustering and Markov Models. PhD thesis, ETH Zürich.

Full text not available from this repository.

## Abstract

The dynamics of biomolecules, in particular the folding of peptides and proteins, is a highly complex process. The temporal and configurational resolution of experiments is typically too poor to yield a detailed picture of this process.The network of forces which enter the equations of motion governing this process, on the other hand, are too complex to be analyzed directly. Numerical integration of the equations of motion, however, provides a means to obtain a detailed trajectory of a biomolecule. In particular, numerical integration of the classical equations of motion on an atomic level, known as molecular dynamics (MD), has proved a very powerful tool for the elucidation of the dynamics of biomolecules. One reason for its success is that MD cannot only be used to generate trajectories but at the same time is an efficient method to sample the equilibrium distribution of the configurations of a molecule. Once the equilibrium distribution is known, most macroscopic properties can be calculated by averaging. But the equilibrium distribution also represents the gate-way to a more detailed understanding of the dynamics of the molecule. The metastable states of the molecules correspond to regions of high probability density in the configurational space and can, in principle, be extracted directly from the equilibrium distribution. The equilibrium distribution also contains information on how the degrees of freedom of a molecule interact with each other. Analyzing these dependences, one can understand how the conformations of a molecule and ultimately also its dynamics arise from its structural properties. The development of algorithms used in MD and the tremendous increase in computer power over the last thirty years has generated a need for tools to categorize and concisely represent the enormous amount of data which can be generated by modern MD simulations. Markov models of the dynamics of biomolecules provide such a tool. Using these models the dynamics of the relevant degrees of freedom even of large molecules can be represented by a square matrix with a dimension in the order of 1000. The equilibrium distribution emerges naturally as the first eigenvector of this matrix and metastable states can be extracted simply by grouping states according to their kinetic proximity. In this thesis methods to analyze the equilibrium distribution of biomolecules are discussed and tested. Chapter 1 shows how the equations of motion for a single system are linked to equations of motion of the probability distribution and how the equilibrium distribution arises from these. We show how thermodynamic properties can be calculated from the equilibrium distribution and discuss the thermodynamics of protein folding. In the last part of chapter 1 we give a short overview of the technical details of molecular simulation and of the historic development of this method. Chapter 2 treats the construction of stochastic Markov models from deterministic simulations. Emphasis is placed on the assumptions that are made when mapping the equations of motion onto the central quantity of Markov models, the transition matrix. Using a simple two-bit model and simulations of butane, we illustrate in which cases the assumptions are violated and in which cases they are fulfilled. We also review the mathematical properties of transition matrices and discuss their physical interpretation. Chapter 3 discusses the categorization of an equilibrium distribution in terms of metastable states.The metastable states of a small $\beta$-peptide are identified using a kinetic cluster algorithm (which is based on a Markov model of the peptide) and compared to results of geometric cluster algorithms (which are based on a data set which represents the equilibrium distribution). We find that geometric cluster algorithms which use a density estimate as their cluster criterion rather than a cutoff have the best chance of reliably identifying the metastable states of a molecule. In chapter 4 we establish a link between the equilibrium distribution of the same beta-peptide (and some structurally related peptides) to the features of its atomic configuration. For that we test to which degree the conformations of its dihedral angles depend on each other and show how the marginal distributions of the backbone dihedral angles arise from the atomic configuration of the residues. In particular, we can show that structural features which stabilize the folded conformation in beta-peptides are not the same as in natural peptides. Chapter 5 examines the calculation of ensemble averages of properties which depend non-linearly on the configuration of the system on the example of (3)J-coupling constants. When comparing MD data to experimental results, errors in the equilibrium distribution which can be due to force-field or sampling errors, are often corrected by restraining the simulation to the experimental result. This approach is valid as long as the underlying dynamics is not substantially distorted. We discuss in detail why in the case of a non-linear dependence between the configuration and a given property, common restraining methods lead to unrealistic dynamics and based on that propose a modification of these methods. The last chapter of this thesis gives an outlook on the development of analysis tools for MD data. We discuss a personal choice of questions and challenges the field will face in the coming years and also try to envision how sophisticated analysis tools could help to improve our understanding of biomolecular processes.

Item Type: Thesis (PhD) Physical Sciences > Physics > Mathematical & Theoretical Physics > Computational PhysicsPhysical Sciences > Chemistry > Physical ChemistryBiological Sciences > Molecular Biology 844 BioComp Admin 12 Mar 2010 11:29 12 Mar 2010 11:30

Repository Staff Only: item control page