Advanced

An evolutionary basis for protein design and structure prediction

Norn, Christoffer LU (2019)
Abstract (Swedish)
Some 4 billion years ago, nature began the experiment that led to you, that bird on the branch, that branch, and everything else which is alive that you see around you. The generating process of all this splendid life is, of course, evolution. Evolution works not only on the macroscopic scale (think eyes, muscles, wings) but also on the molecular scale (think molecular antennas, nanoscale muscles, nanoscale motors). On the molecular scale, almost everything with a function is made of proteins and by proteins. Understanding how proteins work is, therefore, a major goal in science. They also play a crucial role in society: Our ability to engineer and design them is a key reason why many cancers are no longer a death sentence, why we can... (More)
Some 4 billion years ago, nature began the experiment that led to you, that bird on the branch, that branch, and everything else which is alive that you see around you. The generating process of all this splendid life is, of course, evolution. Evolution works not only on the macroscopic scale (think eyes, muscles, wings) but also on the molecular scale (think molecular antennas, nanoscale muscles, nanoscale motors). On the molecular scale, almost everything with a function is made of proteins and by proteins. Understanding how proteins work is, therefore, a major goal in science. They also play a crucial role in society: Our ability to engineer and design them is a key reason why many cancers are no longer a death sentence, why we can cold-wash your laundry (yearly saving the environment from millions of tonnes of CO2), and it provides essential research tools for understanding the mechanics of life. However, proteins are not easily engineered. They do not have the same elegance as the famous DNA double-helix. Instead, there are, in the words of Max Perutz, one of the discoverers of the hemoglobin structure (the protein that transports oxygen throughout your body), "hideous and visceral-looking" objects.

To improve our ability to design proteins, we looked to nature for advice. Imagine visiting the butterfly collection at the Geological Museum at the University of Copenhagen, with the goal of altering the wing of a butterfly to improve its flight capabilities. You are met with a menagerie of thousands of wing shapes and body sizes. Which shape gives the fastest flight? Do you just "average" the wing-shape or do you take the most common shape? Neither. You need to know how flight capability affects survival and a model of how survival affects the observed butterfly diversity. In this thesis, we pursued a similar path, but for proteins. They are not kept at display in a traditional museums, but are stored in digital libraries accessible to anyone with an internet connection. We found that most of the variation in proteins could be explained by their stability (in the butterfly analogy, that flight capability is a major determinant of survival). Stability is an essential property for that protein engineers seek to optimize. We further found that stability could be predicted from the observed diversity.

Using the above knowledge and structure based models of protein stability, we designed a type of protein called antibodies. Antibodies are the reason that your body, most of the time, can defend itself against the festering of bacteria, virus, fungi, and cancers. They are also the reason that the biopharmaceutical industry earns some 200 billion dollars each year. We designed new antibodies that could bind two protein targets, and developed a new method, which could predict the structure of antibodies. (Less)
Abstract
The sequence diversity of protein families is a result of the biophysical selection pressures that shaped their evolutionary history. Among the dominant pressures is selection for protein thermostability, which in itself is an attractive target in protein engineering because of its importance for various biopharmaceutical properties, the performance of industrial enzymes, and the ability to design new protein functions.

In the first part of this thesis, we use models of evolutionary dynamics and biophysical fitness functions to derive the relationship between amino acid frequencies in sites of proteins and the stability effects of mutations. This analysis suggests that a commonly applied assumption (that amino acids frequencies... (More)
The sequence diversity of protein families is a result of the biophysical selection pressures that shaped their evolutionary history. Among the dominant pressures is selection for protein thermostability, which in itself is an attractive target in protein engineering because of its importance for various biopharmaceutical properties, the performance of industrial enzymes, and the ability to design new protein functions.

In the first part of this thesis, we use models of evolutionary dynamics and biophysical fitness functions to derive the relationship between amino acid frequencies in sites of proteins and the stability effects of mutations. This analysis suggests that a commonly applied assumption (that amino acids frequencies are Boltzmann distributed) is inaccurate, and we provide a new relation consistent with the current understanding of evolutionary dynamics and protein fitness. Next, we study the extent to which the evolutionary pattern of amino acid substitutions can be explained by protein stability, as predicted using all-atom models of protein energetics. We show that at least 65\% of the substitution pattern can be explained by thermostability. With the same model, we show that functional sites (e.g. active sites or binding sites) can be predicted when the apparent evolutionary site-rate deviates significantly from that of a stability-only null-model of evolution. Finally, we study how the strength of selective pressure affects the evolutionary behavior of proteins, again using the same models, but this time generating evolutionary trajectories. We find that energetic coupling between amino acids (coevolution) and the detriment of mutation increases as the strength of selection increases.

Antibodies are a key molecular component of the adaptive immune system of vertebrates and an important biopharmaceutical molecule. In the second part of the thesis, we predict and design the structure of antibodies by using energetics derived from sequence alignments and following the evolutionary encoded modular segmentation of the molecule. Through multiple design and test iterations, we were able to design antibodies, which express stably and, in some cases, bind target antigens. The developed structure prediction algorithm performs as well as other methods, is in some cases more accurate, and produces models with lower chemical strain. We use the structure prediction method to study a tumor-associated carbohydrate binding antibody.

Finally, we also review the literature on design of symmetrical protein self-assembly, and study the dynamical properties of a partially disordered chaperone protein, calreticulin. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Professor Pollock, David, University of Colorado, USA
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Protein evolution, Biophysics, Protein design, Protein structure prediction
pages
252 pages
publisher
Lunds universitet
defense location
Kemicentrum, Sal B, Naturvetarvägen 14, Lund
defense date
2019-02-08 13:00
ISBN
978-91-7422-619-5
978-91-7422-618-8
language
English
LU publication?
yes
id
e8d4b28e-ab4a-4432-891d-1d71f5f20b7e
date added to LUP
2019-01-13 20:39:32
date last changed
2019-01-17 12:44:18
@phdthesis{e8d4b28e-ab4a-4432-891d-1d71f5f20b7e,
  abstract     = {The sequence diversity of protein families is a result of the biophysical selection pressures that shaped their evolutionary history. Among the dominant pressures is selection for protein thermostability, which in itself is an attractive target in protein engineering because of its importance for various biopharmaceutical properties, the performance of industrial enzymes, and the ability to design new protein functions.  <br/><br/>In the first part of this thesis, we use models of evolutionary dynamics and biophysical fitness functions to derive the relationship between amino acid frequencies in sites of proteins and the stability effects of mutations. This analysis suggests that a commonly applied assumption (that amino acids frequencies are Boltzmann distributed) is inaccurate, and we provide a new relation consistent with the current understanding of evolutionary dynamics and protein fitness. Next, we study the extent to which the evolutionary pattern of amino acid substitutions can be explained by protein stability, as predicted using all-atom models of protein energetics. We show that at least 65\% of the substitution pattern can be explained by thermostability. With the same model, we show that functional sites (e.g. active sites or binding sites) can be predicted when the apparent evolutionary site-rate deviates significantly from that of a stability-only null-model of evolution. Finally, we study how the strength of selective pressure affects the evolutionary behavior of proteins, again using the same models, but this time generating evolutionary trajectories. We find that energetic coupling between amino acids (coevolution) and the detriment of mutation increases as the strength of selection increases. <br/><br/>Antibodies are a key molecular component of the adaptive immune system of vertebrates and an important biopharmaceutical molecule. In the second part of the thesis, we predict and design the structure of antibodies by using energetics derived from sequence alignments and following the evolutionary encoded modular segmentation of the molecule. Through multiple design and test iterations, we were able to design antibodies, which express stably and, in some cases, bind target antigens. The developed structure prediction algorithm performs as well as other methods, is in some cases more accurate, and produces models with lower chemical strain. We use the structure prediction method to study a tumor-associated carbohydrate binding antibody.<br/><br/>Finally, we also review the literature on design of symmetrical protein self-assembly, and study the dynamical properties of a partially disordered chaperone protein, calreticulin.},
  author       = {Norn, Christoffer},
  isbn         = {978-91-7422-619-5},
  keyword      = {Protein evolution,Biophysics,Protein design,Protein structure prediction},
  language     = {eng},
  pages        = {252},
  publisher    = {Lunds universitet},
  school       = {Lund University},
  title        = {An evolutionary basis for protein design and structure prediction},
  year         = {2019},
}