Advanced

ProTstab - Predictor for cellular protein stability

Yang, Yang LU ; Ding, Xuesong ; Zhu, Guanchen ; Niroula, Abhishek LU ; Lv, Qiang and Vihinen, Mauno LU (2019) In BMC Genomics 20(1).
Abstract

Background: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results: We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well... (More)

Background: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results: We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. Conclusions: The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.

(Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Machine learning, Prediction, Protein stability, Proteome properties
in
BMC Genomics
volume
20
issue
1
article number
804
publisher
BMC Genomics
external identifiers
  • pmid:31684883
  • scopus:85074550799
ISSN
1471-2164
DOI
10.1186/s12864-019-6138-7
language
English
LU publication?
yes
id
56dd729e-bc70-4655-bd11-58d0b56fab5c
date added to LUP
2019-11-18 13:46:52
date last changed
2020-01-13 02:31:56
@article{56dd729e-bc70-4655-bd11-58d0b56fab5c,
  abstract     = {<p>Background: Stability is one of the most fundamental intrinsic characteristics of proteins and can be determined with various methods. Characterization of protein properties does not keep pace with increase in new sequence data and therefore even basic properties are not known for far majority of identified proteins. There have been some attempts to develop predictors for protein stabilities; however, they have suffered from small numbers of known examples. Results: We took benefit of results from a recently developed cellular stability method, which is based on limited proteolysis and mass spectrometry, and developed a machine learning method using gradient boosting of regression trees. ProTstab method has high performance and is well suited for large scale prediction of protein stabilities. Conclusions: The Pearson's correlation coefficient was 0.793 in 10-fold cross validation and 0.763 in independent blind test. The corresponding values for mean absolute error are 0.024 and 0.036, respectively. Comparison with a previously published method indicated ProTstab to have superior performance. We used the method to predict stabilities of all the remaining proteins in the entire human proteome and then correlated the predicted stabilities to protein chain lengths of isoforms and to localizations of proteins.</p>},
  author       = {Yang, Yang and Ding, Xuesong and Zhu, Guanchen and Niroula, Abhishek and Lv, Qiang and Vihinen, Mauno},
  issn         = {1471-2164},
  language     = {eng},
  month        = {11},
  number       = {1},
  publisher    = {BMC Genomics},
  series       = {BMC Genomics},
  title        = {ProTstab - Predictor for cellular protein stability},
  url          = {http://dx.doi.org/10.1186/s12864-019-6138-7},
  doi          = {10.1186/s12864-019-6138-7},
  volume       = {20},
  year         = {2019},
}