Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Using predicted shape string to enhance the accuracy of γ-turn prediction

Zhu, Yaojuan ; Li, Tonghua ; Li, Dapeng ; Zhang, Yun LU ; Xiong, Wenwei ; Sun, Jiangming LU orcid ; Tang, Zehui and Chen, Guanyan (2012) In Amino Acids 42(5). p.1749-1755
Abstract

Numerous methods for predicting c-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for... (More)

Numerous methods for predicting c-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 nonhomologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Q total can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji. edu.cn/ GammaTurnPrediction/ freely.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; and
publishing date
type
Contribution to journal
publication status
published
subject
keywords
γ-Turn prediction, Imbalanced data, Shape string, Support vector machine (SVM)
in
Amino Acids
volume
42
issue
5
pages
1749 - 1755
publisher
Springer
external identifiers
  • pmid:21424809
  • scopus:84862772398
ISSN
0939-4451
DOI
10.1007/s00726-011-0889-z
language
English
LU publication?
no
id
b9102a91-048a-4465-921e-00981f82bfaf
date added to LUP
2023-04-24 16:40:22
date last changed
2024-02-03 12:33:51
@article{b9102a91-048a-4465-921e-00981f82bfaf,
  abstract     = {{<p>Numerous methods for predicting c-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 nonhomologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Q <sub>total</sub> can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site ftp://cheminfo.tongji. edu.cn/ GammaTurnPrediction/ freely.</p>}},
  author       = {{Zhu, Yaojuan and Li, Tonghua and Li, Dapeng and Zhang, Yun and Xiong, Wenwei and Sun, Jiangming and Tang, Zehui and Chen, Guanyan}},
  issn         = {{0939-4451}},
  keywords     = {{γ-Turn prediction; Imbalanced data; Shape string; Support vector machine (SVM)}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{1749--1755}},
  publisher    = {{Springer}},
  series       = {{Amino Acids}},
  title        = {{Using predicted shape string to enhance the accuracy of γ-turn prediction}},
  url          = {{http://dx.doi.org/10.1007/s00726-011-0889-z}},
  doi          = {{10.1007/s00726-011-0889-z}},
  volume       = {{42}},
  year         = {{2012}},
}