Repository: Freie Universität Berlin, Math Department

Annotated Alignments

Bais, Abha Singh (2008) Annotated Alignments. PhD thesis, Freie Universität Berlin.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.17169/refubium-4552

Abstract

Elucidating the mechanisms of transcriptional regulation relies heavily on the sequence annotation of the binding sites of DNA-binding proteins called transcription factors. With the rationale that binding sites conserved across different species are more likely to be functional, the standard approach is to employ cross-species comparisons and focus the search to conserved regions. Usually, computational methods that annotate conserved binding sites perform the alignment and binding site annotation steps separately and combine the results in the end. If the binding site descriptions are weak or the sequence similarity is low, the local gap structure of the alignment poses a problem in detecting the conserved sites. In this thesis, I introduce a novel method that integrates the two axes of sequence conservation and binding site annotation in a simultaneous approach yielding \emph{annotated alignments} -- pairwise alignments with parts annotated as putative conserved transcription factor binding sites. Standard pairwise alignments are extended to include additional states for binding site profiles. A statistical framework that estimates profile-related parameters based on desired type I and type II errors is prescribed. This forms the core of the tool {\bf{SimAnn}}. As an extension, I use existing probabilistic models to demonstrate how the framework can be adapted to consider position-specific evolutionary characteristics of binding sites during parameter estimation. This underlies the tool {\bf{eSimAnn}}. Through simulations and real data analysis, I study the influence of considering a simultaneous approach as opposed to a multi-step one on resulting predictions. The former enables a local rearrangement in the alignment structure to bring forth perfectly aligned binding sites. This precludes the necessity of adopting post-processing steps to handle errors in pre-computed alignments, as is usually done in multi-step approaches. Additionally, the framework for parameter estimation is applicable to any novel profile of interest. Especially for instances with poor sequence conservation or profile quality, the simultaneous approach stands out. As a by-product of the analysis, I also present a formulation of the annotated alignment problem as an extended pair Hidden Markov Model and illustrate the correspondence between the various theoretical concepts.

Item Type:Thesis (PhD)
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:2851
Deposited By: Anja Kasseckert
Deposited On:05 Sep 2022 13:41
Last Modified:05 Sep 2022 13:41

Repository Staff Only: item control page