Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions

Kabir, Muhammad LU ; Ahmed, Saeed LU ; Zhang, Haoyang LU orcid ; Rodríguez-Rodríguez, Ignacio LU ; Najibi, Seyed Morteza LU orcid and Vihinen, Mauno LU orcid (2025) In International Journal of Molecular Sciences 26(5).
Abstract
Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for... (More)
Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
International Journal of Molecular Sciences
volume
26
issue
5
article number
2004
publisher
MDPI AG
external identifiers
  • pmid:40076632
  • scopus:86000790431
ISSN
1422-0067
DOI
10.3390/ijms26052004
language
English
LU publication?
yes
id
3998a62c-2af8-4bd8-891f-008d1de51ec8
date added to LUP
2025-03-02 13:34:29
date last changed
2025-06-09 04:01:14
@article{3998a62c-2af8-4bd8-891f-008d1de51ec8,
  abstract     = {{Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign.}},
  author       = {{Kabir, Muhammad and Ahmed, Saeed and Zhang, Haoyang and Rodríguez-Rodríguez, Ignacio and Najibi, Seyed Morteza and Vihinen, Mauno}},
  issn         = {{1422-0067}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{MDPI AG}},
  series       = {{International Journal of Molecular Sciences}},
  title        = {{PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions}},
  url          = {{http://dx.doi.org/10.3390/ijms26052004}},
  doi          = {{10.3390/ijms26052004}},
  volume       = {{26}},
  year         = {{2025}},
}