Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Predicting Protein Stability with Machine Learning

Carlsson, Lucas LU (2023) KFKM05 20231
Biophysical Chemistry
Abstract
Protein stability is a property of high importance and is of interest in a variety of fields. It determines if a protein has its native fold and can be of influence in certain diseases such as Parkinson's and Alzheimer's disease. It can also be of interest in an industrial setting to optimise the stability of enzymes in certain physicochemical environments. Recent developments in machine learning have yielded novel methods able to predict protein characteristics with surprising accuracy solely from sequence information. However, few such models have been proposed for predicting protein stability. The aim of this project was to create a model able to predict protein stability from sequence information, by utilising a multiple sequence... (More)
Protein stability is a property of high importance and is of interest in a variety of fields. It determines if a protein has its native fold and can be of influence in certain diseases such as Parkinson's and Alzheimer's disease. It can also be of interest in an industrial setting to optimise the stability of enzymes in certain physicochemical environments. Recent developments in machine learning have yielded novel methods able to predict protein characteristics with surprising accuracy solely from sequence information. However, few such models have been proposed for predicting protein stability. The aim of this project was to create a model able to predict protein stability from sequence information, by utilising a multiple sequence alignment based protein language model. Different models were developed to predict two quantities relating to protein stability, Gibbs free energy of unfolding and the heat denaturation temperature. Due to limited training data the performance of the models predicting Gibbs free energy was poor. One of the models predicting heat denaturation temperature, proved more promising, with higher performance than a previously published model trained on similar data. However its ability to predict conventionally obtained heat denaturation temperatures was poor. (Less)
Popular Abstract
Can computers find better proteins?

Finding more efficient enzymes is critical for chemical industry to become sustainable. In order to efficiently find better enzymes new tools are needed that reliably predict protein properties.

The chemical industry is on a search for new green technology that can replace catalysts that are often toxic and made of scarce minerals. A promising alternative is catalytic proteins, so called enzymes, that can be produced renewably and pose no toxic threat. However, in order to be a viable alternative, the enzymes must become more effective and be able to withstand harsher conditions.

To design enhanced or entirely new enzymes accurate and fast predictions regarding the protein’s properties are... (More)
Can computers find better proteins?

Finding more efficient enzymes is critical for chemical industry to become sustainable. In order to efficiently find better enzymes new tools are needed that reliably predict protein properties.

The chemical industry is on a search for new green technology that can replace catalysts that are often toxic and made of scarce minerals. A promising alternative is catalytic proteins, so called enzymes, that can be produced renewably and pose no toxic threat. However, in order to be a viable alternative, the enzymes must become more effective and be able to withstand harsher conditions.

To design enhanced or entirely new enzymes accurate and fast predictions regarding the protein’s properties are needed. One such highly sought after property is protein stability at high temperatures since it usually speeds up chemical reactions. A promising approach for developing predictive models are machine learning techniques, which have proved powerful in numerous fields.

At the department of Biophysical Chemistry at Lund University it was tested if such a model, that predicts the thermal stability of proteins, could be created. The study concluded that the tested models could detect some features of proteins, however for the models to be more useful, they must be trained on more data of high quality.

The limiting factor during development was therefore the amount of relevant and available data. This is a common problem in the machine learning field, where the models are created by training on data, the computer is taught how to make predictions by learning from a dataset. The rule of thumb is that the more data it is trained on, the better performance.

But what would be the benefit of using computers over good old laboratory work? While resulting in reliable values, laboratory work is usually costly and time-consuming and requires well trained personnel. If a relatively accurate predictive model can be constructed an estimate of a protein’s stability can be obtained in a few minutes or hours instead of weeks. The benefit is obvious if the stabilities of many proteins are of interest.

Furthermore, the technology would enable researchers to quickly evaluate a large number of enzyme variations at an early stage, therefore enabling selection of only the promising enzymes for further investigation.

Accurate models of protein stability is not only useful for chemical industry, but could also be used to better understand the proteins found in nature, and how some mutations can lead to certain diseases. Diseases such as Alzheimer’s and Parkinson’s disease have been linked to proteins losing their natural structure and instead forming protein aggregations. By better understanding the properties of these proteins, researchers hope to find new treatments for these diseases that affect roughly 30 million people worldwide, which is about the same population as countries such as Nepal or Venezuela.

By solving the protein stability problem multiple fields could undergo a revolution, leading to a more sustainable and healthier world. However, if computers are the ones to solve it more experimental data is needed. (Less)
Please use this url to cite or link to this publication:
author
Carlsson, Lucas LU
supervisor
organization
course
KFKM05 20231
year
type
H3 - Professional qualifications (4 Years - )
subject
keywords
Protein Stability, Biophysical Chemistry, Thermodynamics, Machine Learning, Deep Learning, Protein Language Model, Biochemistry, Multiple Sequence Alignment, Bioinformatics
language
English
id
9139458
date added to LUP
2023-10-03 08:47:58
date last changed
2023-10-03 08:47:58
@misc{9139458,
  abstract     = {{Protein stability is a property of high importance and is of interest in a variety of fields. It determines if a protein has its native fold and can be of influence in certain diseases such as Parkinson's and Alzheimer's disease. It can also be of interest in an industrial setting to optimise the stability of enzymes in certain physicochemical environments. Recent developments in machine learning have yielded novel methods able to predict protein characteristics with surprising accuracy solely from sequence information. However, few such models have been proposed for predicting protein stability. The aim of this project was to create a model able to predict protein stability from sequence information, by utilising a multiple sequence alignment based protein language model. Different models were developed to predict two quantities relating to protein stability, Gibbs free energy of unfolding and the heat denaturation temperature. Due to limited training data the performance of the models predicting Gibbs free energy was poor. One of the models predicting heat denaturation temperature, proved more promising, with higher performance than a previously published model trained on similar data. However its ability to predict conventionally obtained heat denaturation temperatures was poor.}},
  author       = {{Carlsson, Lucas}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Predicting Protein Stability with Machine Learning}},
  year         = {{2023}},
}