Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

PON-Fold : Prediction of Substitutions Affecting Protein Folding Rate

Yang, Yang LU ; Chong, Zhang and Vihinen, Mauno LU orcid (2023) In International Journal of Molecular Sciences 24(16).
Abstract

Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of... (More)

Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of 0.525. The error measures, mean absolute error and mean squared error, were 0.581 and 0.603, respectively. One of the previously presented tools could be used for comparison with the blind test data set, our method called PON-Fold showed superior performance on all used measures. The applicability of the tool was tested by predicting all possible substitutions in a protein domain. Predictions for different conformations of proteins, open and closed forms of a protein kinase, and apo and holo forms of an enzyme indicated that the choice of the structure had a large impact on the outcome. PON-Fold is freely available.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
amino acid substitution, folding rate, machine learning, protein folding, protein unfolding, variation interpretation
in
International Journal of Molecular Sciences
volume
24
issue
16
article number
13023
publisher
MDPI AG
external identifiers
  • pmid:37629203
  • scopus:85169095321
ISSN
1661-6596
DOI
10.3390/ijms241613023
language
English
LU publication?
yes
id
5ddb6f9e-38c0-4ff2-8155-a51ec0d06c4e
date added to LUP
2023-10-31 15:08:43
date last changed
2024-04-19 03:14:02
@article{5ddb6f9e-38c0-4ff2-8155-a51ec0d06c4e,
  abstract     = {{<p>Most proteins fold into characteristic three-dimensional structures. The rate of folding and unfolding varies widely and can be affected by variations in proteins. We developed a novel machine-learning-based method for the prediction of the folding rate effects of amino acid substitutions in two-state folding proteins. We collected a data set of experimentally defined folding rates for variants and used them to train a gradient boosting algorithm starting with 1161 features. Two predictors were designed. The three-class classifier had, in blind tests, specificity and sensitivity ranging from 0.324 to 0.419 and from 0.256 to 0.451, respectively. The other tool was a regression predictor that showed a Pearson correlation coefficient of 0.525. The error measures, mean absolute error and mean squared error, were 0.581 and 0.603, respectively. One of the previously presented tools could be used for comparison with the blind test data set, our method called PON-Fold showed superior performance on all used measures. The applicability of the tool was tested by predicting all possible substitutions in a protein domain. Predictions for different conformations of proteins, open and closed forms of a protein kinase, and apo and holo forms of an enzyme indicated that the choice of the structure had a large impact on the outcome. PON-Fold is freely available.</p>}},
  author       = {{Yang, Yang and Chong, Zhang and Vihinen, Mauno}},
  issn         = {{1661-6596}},
  keywords     = {{amino acid substitution; folding rate; machine learning; protein folding; protein unfolding; variation interpretation}},
  language     = {{eng}},
  number       = {{16}},
  publisher    = {{MDPI AG}},
  series       = {{International Journal of Molecular Sciences}},
  title        = {{PON-Fold : Prediction of Substitutions Affecting Protein Folding Rate}},
  url          = {{http://dx.doi.org/10.3390/ijms241613023}},
  doi          = {{10.3390/ijms241613023}},
  volume       = {{24}},
  year         = {{2023}},
}