Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

PEPRF : Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

Wu, Chuanyan ; Lin, Bentao ; Shi, Kai ; Zhang, Qingju ; Gao, Rui ; Yu, Zhiguo ; De Marinis, Yang LU ; Zhang, Yusen and Liu, Zhi Ping (2021) In Current Bioinformatics 16(9). p.1161-1168
Abstract

Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, in-formation-based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In... (More)

Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, in-formation-based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, Re-liefF-based feature selection method was adopted to measure the weights of these features. Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential pro-teins.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Feature extraction, Graph energy, PEPRF, Random forest classi-fier, ReliefF-based feature selection, Rma Essential protein prediction
in
Current Bioinformatics
volume
16
issue
9
pages
8 pages
publisher
Bentham Science Publishers
external identifiers
  • scopus:85122849381
ISSN
1574-8936
DOI
10.2174/1574893616666210617162258
language
English
LU publication?
yes
id
0f8f3e1d-d942-4104-9866-83b4a625fc99
date added to LUP
2022-02-18 10:47:38
date last changed
2024-05-07 01:49:26
@article{0f8f3e1d-d942-4104-9866-83b4a625fc99,
  abstract     = {{<p>Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming. Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning. Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, in-formation-based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, Re-liefF-based feature selection method was adopted to measure the weights of these features. Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742. Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential pro-teins.</p>}},
  author       = {{Wu, Chuanyan and Lin, Bentao and Shi, Kai and Zhang, Qingju and Gao, Rui and Yu, Zhiguo and De Marinis, Yang and Zhang, Yusen and Liu, Zhi Ping}},
  issn         = {{1574-8936}},
  keywords     = {{Feature extraction; Graph energy; PEPRF; Random forest classi-fier; ReliefF-based feature selection; Rma Essential protein prediction}},
  language     = {{eng}},
  month        = {{11}},
  number       = {{9}},
  pages        = {{1161--1168}},
  publisher    = {{Bentham Science Publishers}},
  series       = {{Current Bioinformatics}},
  title        = {{PEPRF : Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest}},
  url          = {{http://dx.doi.org/10.2174/1574893616666210617162258}},
  doi          = {{10.2174/1574893616666210617162258}},
  volume       = {{16}},
  year         = {{2021}},
}