Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

PON-Sol2 : Prediction of effects of variants on protein solubility

Yang, Yang LU ; Zeng, Lianjie and Vihinen, Mauno LU orcid (2021) In International Journal of Molecular Sciences 22(15).
Abstract

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of... (More)

Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Artificial intelligence, Machine learning, Mutation, PON-Sol2, Prediction, Protein solubility prediction, Variation, Variation interpretation
in
International Journal of Molecular Sciences
volume
22
issue
15
article number
8027
publisher
MDPI AG
external identifiers
  • scopus:85111125476
  • pmid:34360790
ISSN
1661-6596
DOI
10.3390/ijms22158027
language
English
LU publication?
yes
id
ba875afa-1663-4a9c-9641-3b0190a22280
date added to LUP
2022-03-22 17:15:46
date last changed
2024-04-11 19:25:49
@article{ba875afa-1663-4a9c-9641-3b0190a22280,
  abstract     = {{<p>Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein–solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.</p>}},
  author       = {{Yang, Yang and Zeng, Lianjie and Vihinen, Mauno}},
  issn         = {{1661-6596}},
  keywords     = {{Artificial intelligence; Machine learning; Mutation; PON-Sol2; Prediction; Protein solubility prediction; Variation; Variation interpretation}},
  language     = {{eng}},
  month        = {{08}},
  number       = {{15}},
  publisher    = {{MDPI AG}},
  series       = {{International Journal of Molecular Sciences}},
  title        = {{PON-Sol2 : Prediction of effects of variants on protein solubility}},
  url          = {{http://dx.doi.org/10.3390/ijms22158027}},
  doi          = {{10.3390/ijms22158027}},
  volume       = {{22}},
  year         = {{2021}},
}