Structure-based prediction of the effects of a missense variant on protein stability
(2013) In Amino Acids 44(3). p.847-855- Abstract
- Predicting the effects of amino acid substitutions on protein stability provides invaluable information for protein design, the assignment of biological function, and for understanding disease-associated variations. To understand the effects of substitutions, computational models are preferred to time-consuming and expensive experimental methods. Several methods have been proposed for this task including machine learning-based approaches. However, models trained using limited data have performance problems and many model parameters tend to be over-fitted. To decrease the number of model parameters and to improve the generalization potential, we calculated the amino acid contact energy change for point variations using a structure-based... (More)
- Predicting the effects of amino acid substitutions on protein stability provides invaluable information for protein design, the assignment of biological function, and for understanding disease-associated variations. To understand the effects of substitutions, computational models are preferred to time-consuming and expensive experimental methods. Several methods have been proposed for this task including machine learning-based approaches. However, models trained using limited data have performance problems and many model parameters tend to be over-fitted. To decrease the number of model parameters and to improve the generalization potential, we calculated the amino acid contact energy change for point variations using a structure-based coarse-grained model. Based on the structural properties including contact energy (CE) and further physicochemical properties of the amino acids as input features, we developed two support vector machine classifiers. M47 predicted the stability of variant proteins with an accuracy of 87 % and a Matthews correlation coefficient of 0.68 for a large dataset of 1925 variants, whereas M8 performed better when a relatively small dataset of 388 variants was used for 20-fold cross-validation. The performance of the M47 classifier on all six tested contingency table evaluation parameters is better than that of existing machine learning-based models or energy function-based protein stability classifiers. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/3576931
- author
- Yang, Yang ; Chen, Biao ; Tan, Ge ; Vihinen, Mauno LU and Shen, Bairong
- organization
- publishing date
- 2013
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Amino acid mutation, Physicochemical properties, Residue-residue contact, energy, Support vector machine, Protein stability prediction
- in
- Amino Acids
- volume
- 44
- issue
- 3
- pages
- 847 - 855
- publisher
- Springer
- external identifiers
-
- wos:000314760700004
- scopus:84878362869
- pmid:23064876
- ISSN
- 0939-4451
- DOI
- 10.1007/s00726-012-1407-7
- language
- English
- LU publication?
- yes
- id
- c1c0e178-9d9b-4c12-a683-e66ebeb5201e (old id 3576931)
- date added to LUP
- 2016-04-01 11:01:25
- date last changed
- 2022-04-28 03:47:41
@article{c1c0e178-9d9b-4c12-a683-e66ebeb5201e, abstract = {{Predicting the effects of amino acid substitutions on protein stability provides invaluable information for protein design, the assignment of biological function, and for understanding disease-associated variations. To understand the effects of substitutions, computational models are preferred to time-consuming and expensive experimental methods. Several methods have been proposed for this task including machine learning-based approaches. However, models trained using limited data have performance problems and many model parameters tend to be over-fitted. To decrease the number of model parameters and to improve the generalization potential, we calculated the amino acid contact energy change for point variations using a structure-based coarse-grained model. Based on the structural properties including contact energy (CE) and further physicochemical properties of the amino acids as input features, we developed two support vector machine classifiers. M47 predicted the stability of variant proteins with an accuracy of 87 % and a Matthews correlation coefficient of 0.68 for a large dataset of 1925 variants, whereas M8 performed better when a relatively small dataset of 388 variants was used for 20-fold cross-validation. The performance of the M47 classifier on all six tested contingency table evaluation parameters is better than that of existing machine learning-based models or energy function-based protein stability classifiers.}}, author = {{Yang, Yang and Chen, Biao and Tan, Ge and Vihinen, Mauno and Shen, Bairong}}, issn = {{0939-4451}}, keywords = {{Amino acid mutation; Physicochemical properties; Residue-residue contact; energy; Support vector machine; Protein stability prediction}}, language = {{eng}}, number = {{3}}, pages = {{847--855}}, publisher = {{Springer}}, series = {{Amino Acids}}, title = {{Structure-based prediction of the effects of a missense variant on protein stability}}, url = {{http://dx.doi.org/10.1007/s00726-012-1407-7}}, doi = {{10.1007/s00726-012-1407-7}}, volume = {{44}}, year = {{2013}}, }