Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Exploration of an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins

Dias Correia de Oliveira, Fábio LU (2019) FYTM03 20191
Computational Biology and Biological Physics - Undergoing reorganization
Abstract
Understanding the patterns of evolutionary sequence divergence is fundamental for comparative analyses like phylogenetics or genomics. The rate at which the different sites of protein sequences evolve is multifactorial and the causes of variation among them are highly convoluted. Inference methods have been developed to estimate site-specific evolution rates from sequence alignments. Moreover, several molecular traits have been found to correlate with site-specific rates: solvent accessibility, packing density and protein function are some of them. Correlations between rates and predictor variables allow to identify factors that influence rate variation, but they do not provide explicit mechanistic insights into why a given site is... (More)
Understanding the patterns of evolutionary sequence divergence is fundamental for comparative analyses like phylogenetics or genomics. The rate at which the different sites of protein sequences evolve is multifactorial and the causes of variation among them are highly convoluted. Inference methods have been developed to estimate site-specific evolution rates from sequence alignments. Moreover, several molecular traits have been found to correlate with site-specific rates: solvent accessibility, packing density and protein function are some of them. Correlations between rates and predictor variables allow to identify factors that influence rate variation, but they do not provide explicit mechanistic insights into why a given site is variable or conserved. Luckily, the field of protein evolution is amenable to the development of fundamental theory. Hence, mechanistic biophysical models have been proposed to explain the observed rates. Biophysical models are essentially based on protein stability - this is reasonable because stability is related to all the molecular features that correlate with evolutionary rates.

Norn et al. developed an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins. The model has been shown to closely recapitulate the average amino acid substitution rate behaviour. However, the model fails to achieve the same level of accuracy for site-specific rate recapitulation. Several reasons have been put forward so as to explain the weak correlation; but two hold the most interest: propagation of stability prediction errors and the fact that the model relies on only a single protein structure to extrapolate site-rates that emerge within a protein phylogeny. The results obtained in this thesis support the hypothesis defended by Norn et al. that the propagation of stability prediction errors impacts the correlation value but are not enough to explain the average weak correlation; additionally, it is shown that a weighted average of the site-rates of the proteins in a given phylogeny does a better job at recapitulating its empirically inferred site-rates. In consequence, this effectively opens the doors for further adaptation of Norn et al. model to phylogenetic analysis. (Less)
Popular Abstract
Proteins are fascinating: they are made up of a simple sequence of amino acids which folds to a complex functional structure. There is, however, a lot of redundancy – different sequences can give rise to the same functional structure. Organisms of different species possess analogous proteins that perform the same task and therefore have a similar structure, but different sequences. Often those analogous proteins have a common ancestor from which they diverged through different evolutionary pressures.

For proteins to evolve the units that compose them, amino acids, must change with time. Depending on their position in the protein, they evolve at different rates. To check which sites in the protein are more conserved and which are more... (More)
Proteins are fascinating: they are made up of a simple sequence of amino acids which folds to a complex functional structure. There is, however, a lot of redundancy – different sequences can give rise to the same functional structure. Organisms of different species possess analogous proteins that perform the same task and therefore have a similar structure, but different sequences. Often those analogous proteins have a common ancestor from which they diverged through different evolutionary pressures.

For proteins to evolve the units that compose them, amino acids, must change with time. Depending on their position in the protein, they evolve at different rates. To check which sites in the protein are more conserved and which are more variable multiple sequence alignments can be done. Sequence alignment studies allow to detect patches of the proteins where the amino acid sequence is the same. The conservation of specific amino acids in the same positions in all aligned proteins indicates that they are important. There are two main reasons for some regions to be highly conserved: the first is that those regions are critical for the stability of the functional structure, the second is that that they might be directly involved in the function of the protein. For example, they might be responsible for the binding to a specific substrate. The problem with the approach above is that there is no way to distinguish between the two factors.

With an atomistic model of the protein one can calculate what is the effect on stability caused by changing its amino acid sequence. Sites of the protein that if mutated lead to decreased stability are considered to be conserved and sites that if mutated lead to more stability or a neutral change are considered to be prone to a lot of variation. However, the atomistic model does not explain all the variability in rates that can be extrapolated from sequence alignments. There are two main factors for why this is the case: the stability effects that are calculated are not as accurate as needed and the model does not account for the fact that different proteins responded differently to the same mutations. The main aim of this thesis is to explore the impact of these two factors. Improving this protein sites prediction rate model is of utmost importance. If used together with the sequence alignment method, their synergy provides insight useful to understand when conservation of certain amino acids comes from stability, from functionality or from both. Consequently, it accelerates the process of functional sites identification, which is important for the development of new drugs targeting these sites. (Less)
Please use this url to cite or link to this publication:
author
Dias Correia de Oliveira, Fábio LU
supervisor
organization
course
FYTM03 20191
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
8996177
date added to LUP
2019-10-08 09:31:51
date last changed
2019-10-08 09:31:51
@misc{8996177,
  abstract     = {{Understanding the patterns of evolutionary sequence divergence is fundamental for comparative analyses like phylogenetics or genomics. The rate at which the different sites of protein sequences evolve is multifactorial and the causes of variation among them are highly convoluted. Inference methods have been developed to estimate site-specific evolution rates from sequence alignments. Moreover, several molecular traits have been found to correlate with site-specific rates: solvent accessibility, packing density and protein function are some of them. Correlations between rates and predictor variables allow to identify factors that influence rate variation, but they do not provide explicit mechanistic insights into why a given site is variable or conserved. Luckily, the field of protein evolution is amenable to the development of fundamental theory. Hence, mechanistic biophysical models have been proposed to explain the observed rates. Biophysical models are essentially based on protein stability - this is reasonable because stability is related to all the molecular features that correlate with evolutionary rates. 

Norn et al. developed an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins. The model has been shown to closely recapitulate the average amino acid substitution rate behaviour. However, the model fails to achieve the same level of accuracy for site-specific rate recapitulation. Several reasons have been put forward so as to explain the weak correlation; but two hold the most interest: propagation of stability prediction errors and the fact that the model relies on only a single protein structure to extrapolate site-rates that emerge within a protein phylogeny. The results obtained in this thesis support the hypothesis defended by Norn et al. that the propagation of stability prediction errors impacts the correlation value but are not enough to explain the average weak correlation; additionally, it is shown that a weighted average of the site-rates of the proteins in a given phylogeny does a better job at recapitulating its empirically inferred site-rates. In consequence, this effectively opens the doors for further adaptation of Norn et al. model to phylogenetic analysis.}},
  author       = {{Dias Correia de Oliveira, Fábio}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Exploration of an all-atom thermodynamic model to predict site-specific evolutionary rates in proteins}},
  year         = {{2019}},
}