Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Prediction, Design and Determination of Protein Structures

Rämisch, Sebastian LU (2015)
Abstract
The three-dimensional structure of protein is encoded in its amino acid sequence. Modern structure prediction algorithms make it possible to predict the structure of small proteins using sequence information alone. We used the Fold-and-Dock algorithm, which is part of the Rosetta macromolecular modeling suite, for de novo structure prediction of coiled-coil proteins. Members of this protein family consist of alpha-helices that assemble into symmetric complexes by winding around each other. Although the sequences of different coiled-coils follow a similar general pattern, the number of helices in a complex ranges from two to five. Remarkably, minor modifications of the sequence can change the oligomeric state of a coiled-coil. We tested,... (More)
The three-dimensional structure of protein is encoded in its amino acid sequence. Modern structure prediction algorithms make it possible to predict the structure of small proteins using sequence information alone. We used the Fold-and-Dock algorithm, which is part of the Rosetta macromolecular modeling suite, for de novo structure prediction of coiled-coil proteins. Members of this protein family consist of alpha-helices that assemble into symmetric complexes by winding around each other. Although the sequences of different coiled-coils follow a similar general pattern, the number of helices in a complex ranges from two to five. Remarkably, minor modifications of the sequence can change the oligomeric state of a coiled-coil. We tested, different approaches to predict the oligomeric state of homomeric coiled-coils by comparing the energies of computational models of several alternate complexes. Comparing the free energies of structural models of different size is highly challenging. Our results show that an accurate comparison of different oligomers must consider the free energy of forming a helix. We were able to predict the lowest free energy oligomer in up to 23 out of 33 tested coiled-coils. Additionally, we found that parallel dimeric coiled-coils frequently show significant backbone asymmetries. To be able to accurately predict the structures of this sub-class, we introduced a new Fold-and-Dock version, which now allows for prediction asymmetric complexes.



Subsequently, we used models of coiled-coils, generated by de novo structure prediction, to test whether those are accurate enough to be used for solving X-ray structures by molecular replacement. To this end, we implemented the program CCsolve, which combines existing crystallographic software for fully automated phasing from de novo models, model building and structure refinement, optimized for homomeric coiled-coils. In our benchmark set of 24 coiled-coil structures, only two structures failed; the average difference between the previously reported Rfree values and those obtained by de novo phasing using CCsolve was 0.01. The successfully solved structures had data resolution up 2.5Å and a C-alpha r.m.s.d. of up to 3.3Å between initial model and crystal structure.



Improved force-fields for structure prediction made it possible to find sequences that would fold into new protein structures. We developed a general method to design new repeat proteins, which can serve as binding scaffolds for developing new bio-sensors or inhibitors. Currently, there methods to engineer binding proteins for biochemical or medicinal applications, like antibody design or sequence-based design of repeat proteins. However, those methods lack the possibility to adjust the shape of a binder to the target structure. Using state-of-the-art protein design methods, implemented in Rosetta, we designed leucine-rich repeat (LRR) proteins with a geometry tailored towards the specific application in question. The method utilizes the variety of LRRs with known structure. A single self-compatible repeat is identified that can be re-designed to form a structure with a predefined geometry. LRR proteins form curved, elongated structures with a significant helical twist. As a proof of principle, we designed an LRR-protein that displays a high curvature, no helical twist. The resulting proteins can be expressed as monomer with terminal capping repeats. When expressed without caps, two monomers can self-assemble into planar ring-structure that has not been observed in nature.



To be used in ‘real-world’ applications, design protein must exhibit good biophysical properties. Experimental techniques like directed evolution allow to optimize certain features of a protein by screening many different protein variants. Currently, there is no established method available that can be used to screen large numbers of proteins in a high-throughput manner. We developed a fluorescence-based assay, that allows for a fast comparison of proteins with different biophysical properties. The method monitors the endogenous stress-response in E.coli. By comparing signal from cell cultures expressing point mutants of a test protein with different thermostabilities, we found a good correlation between Tm and the expression of the stress-induced chaperone DnaK. We used a plasmid system that can harbor a gene for overexpression together with multiple reporter genes. This setup should enable the parallel detection of multiple stress-induced proteins upon overexpression of a protein that is to be optimized. (Less)
Abstract (Swedish)
Popular Abstract in English

Life on earth has evolved countless forms, beautiful shapes and colors as well as astonishingly complex systems. Remarkably, all living creatures consist of cells, which themselves are build from the same basic set of molecules, most importantly nucleic acids (RNA and DNA), lipids and proteins. The latter are large macromolecules that fulfill a vast number of tasks; they are best described as the work horses of the cell. Most commonly known are enzymes, which catalyze chemical reactions and hormones, which are used to send signals between distant parts of the body. But also muscle contraction and reading the genetic code from DNA are processes performed by specific proteins.



... (More)
Popular Abstract in English

Life on earth has evolved countless forms, beautiful shapes and colors as well as astonishingly complex systems. Remarkably, all living creatures consist of cells, which themselves are build from the same basic set of molecules, most importantly nucleic acids (RNA and DNA), lipids and proteins. The latter are large macromolecules that fulfill a vast number of tasks; they are best described as the work horses of the cell. Most commonly known are enzymes, which catalyze chemical reactions and hormones, which are used to send signals between distant parts of the body. But also muscle contraction and reading the genetic code from DNA are processes performed by specific proteins.



The word protein describes a class of molecules that have a common basic chemical structure. But each protein has a unique three-dimensional structure. This three-dimensional structure is optimized during the course of evolution, to suit the specific task a protein has. Each protein can be described as a chain of small units, called amino acids. All living cells share the same set of 20 amino acids, each of which has distinct properties. This chain folds into a defined compact structure, which is encoded by the sequence of amino acids.



The question, how it is possible that the amino acid sequence dictates the structure of a protein, is the focus of many scientist. Sophisticated computer simulations made it possible to demonstrate how the amino acids pack against each other, much like in a three-dimensional jigsaw puzzle. Today, it is possible to use simulations for predicting the structure of small proteins when the amino acid sequence is known.



In this work, we both used and extended a state-of-the-art simulation program, called Rosetta to predict how certain proteins come together to form a protein complex. How many single proteins are needed to form such a complex is also encoded in the amino acid sequence. We succeeded to predict this number for 70% of the cases we tested.



In another project, we developed a way to use the predicted complexes to guide the determination of the three-dimensional structure of those proteins. Solving a protein structure is often a hugely laborious and complicated enterprise, which frequently fails. Having correctly predicted structures of protein complexes available, makes this task significantly easier and faster. The method we developed might help other researchers to solve such structures faster in the future.



We further used the Rosetta program to design proteins that do not exist in nature. Designing proteins in the computer is possible because the task is the inverse of predicting a structure: one defines a structure and asks the question, which amino acid sequence would form this structure. Using computer simulations, we designed proteins from a specific class, which is known to be well suited for binding other proteins. This way, we obtained a generic scaffold that may be used in the future to develop medications or new diagnostic tools. The key of the method we developed is that new scaffolds can be designed, with a shape that is tailored for the protein it is supposed to bind.



In the fourth project, we focused on the stability of proteins. When designing novel proteins for applications, they need to be very stable. Stability can often be increased by changing only a few amino acids in the sequence. We successfully tested a way to easily compare the stabilities of different variants of a protein. The method allows for comparing stabilities while the proteins are being produced by bacteria. The noninvasive character makes the assessment fast, so that it might be possible to compare huge numbers of variants in a very short time. This way, proteins could be stabilized to make them available as medication or to create new biomaterials. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Dr. Höcker, Birte, Max Planck Institute for Developmental Biology, Spemannstr. 35, 72076 Tübingen, Germany
organization
publishing date
type
Thesis
publication status
published
subject
keywords
protein design, Rosetta, Fold-and-Dock, leucine rich repeat proteins, coiled-coils, molecular replacement, ab initio phasing, temperature stability, fluorescent proteins
pages
128 pages
publisher
Department of Biochemistry and Structural Biology, Lund University
defense location
Kemicentrum, Hall B
defense date
2015-03-06 10:15:00
ISBN
978-91-7422-389-7
language
English
LU publication?
yes
id
de7ad549-94da-4a26-90bb-bdd4dbb6230f (old id 5044774)
date added to LUP
2016-04-04 11:43:21
date last changed
2018-11-21 21:06:45
@phdthesis{de7ad549-94da-4a26-90bb-bdd4dbb6230f,
  abstract     = {{The three-dimensional structure of protein is encoded in its amino acid sequence. Modern structure prediction algorithms make it possible to predict the structure of small proteins using sequence information alone. We used the Fold-and-Dock algorithm, which is part of the Rosetta macromolecular modeling suite, for de novo structure prediction of coiled-coil proteins. Members of this protein family consist of alpha-helices that assemble into symmetric complexes by winding around each other. Although the sequences of different coiled-coils follow a similar general pattern, the number of helices in a complex ranges from two to five. Remarkably, minor modifications of the sequence can change the oligomeric state of a coiled-coil. We tested, different approaches to predict the oligomeric state of homomeric coiled-coils by comparing the energies of computational models of several alternate complexes. Comparing the free energies of structural models of different size is highly challenging. Our results show that an accurate comparison of different oligomers must consider the free energy of forming a helix. We were able to predict the lowest free energy oligomer in up to 23 out of 33 tested coiled-coils. Additionally, we found that parallel dimeric coiled-coils frequently show significant backbone asymmetries. To be able to accurately predict the structures of this sub-class, we introduced a new Fold-and-Dock version, which now allows for prediction asymmetric complexes. <br/><br>
<br/><br>
Subsequently, we used models of coiled-coils, generated by de novo structure prediction, to test whether those are accurate enough to be used for solving X-ray structures by molecular replacement. To this end, we implemented the program CCsolve, which combines existing crystallographic software for fully automated phasing from de novo models, model building and structure refinement, optimized for homomeric coiled-coils. In our benchmark set of 24 coiled-coil structures, only two structures failed; the average difference between the previously reported Rfree values and those obtained by de novo phasing using CCsolve was 0.01. The successfully solved structures had data resolution up 2.5Å and a C-alpha r.m.s.d. of up to 3.3Å between initial model and crystal structure.<br/><br>
<br/><br>
Improved force-fields for structure prediction made it possible to find sequences that would fold into new protein structures. We developed a general method to design new repeat proteins, which can serve as binding scaffolds for developing new bio-sensors or inhibitors. Currently, there methods to engineer binding proteins for biochemical or medicinal applications, like antibody design or sequence-based design of repeat proteins. However, those methods lack the possibility to adjust the shape of a binder to the target structure. Using state-of-the-art protein design methods, implemented in Rosetta, we designed leucine-rich repeat (LRR) proteins with a geometry tailored towards the specific application in question. The method utilizes the variety of LRRs with known structure. A single self-compatible repeat is identified that can be re-designed to form a structure with a predefined geometry. LRR proteins form curved, elongated structures with a significant helical twist. As a proof of principle, we designed an LRR-protein that displays a high curvature, no helical twist. The resulting proteins can be expressed as monomer with terminal capping repeats. When expressed without caps, two monomers can self-assemble into planar ring-structure that has not been observed in nature.<br/><br>
<br/><br>
To be used in ‘real-world’ applications, design protein must exhibit good biophysical properties. Experimental techniques like directed evolution allow to optimize certain features of a protein by screening many different protein variants. Currently, there is no established method available that can be used to screen large numbers of proteins in a high-throughput manner. We developed a fluorescence-based assay, that allows for a fast comparison of proteins with different biophysical properties. The method monitors the endogenous stress-response in E.coli. By comparing signal from cell cultures expressing point mutants of a test protein with different thermostabilities, we found a good correlation between Tm and the expression of the stress-induced chaperone DnaK. We used a plasmid system that can harbor a gene for overexpression together with multiple reporter genes. This setup should enable the parallel detection of multiple stress-induced proteins upon overexpression of a protein that is to be optimized.}},
  author       = {{Rämisch, Sebastian}},
  isbn         = {{978-91-7422-389-7}},
  keywords     = {{protein design; Rosetta; Fold-and-Dock; leucine rich repeat proteins; coiled-coils; molecular replacement; ab initio phasing; temperature stability; fluorescent proteins}},
  language     = {{eng}},
  publisher    = {{Department of Biochemistry and Structural Biology, Lund University}},
  school       = {{Lund University}},
  title        = {{Prediction, Design and Determination of Protein Structures}},
  year         = {{2015}},
}