PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions

Kabir, Muhammad; Ahmed, Saeed; Zhang, Haoyang; Rodríguez-Rodríguez, Ignacio; Najibi, Seyed Morteza; Vihinen, Mauno

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions

Mark

Kabir, Muhammad ^LU

; Ahmed, Saeed ^LU ; Zhang, Haoyang ^LU

; Rodríguez-Rodríguez, Ignacio ^LU ; Najibi, Seyed Morteza ^LU

and Vihinen, Mauno ^LU

(2025) In International Journal of Molecular Sciences 26(5).

Abstract: Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for... (More); Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/3998a62c-2af8-4bd8-891f-008d1de51ec8

author

Kabir, Muhammad ^LU

; Ahmed, Saeed ^LU ; Zhang, Haoyang ^LU

; Rodríguez-Rodríguez, Ignacio ^LU ; Najibi, Seyed Morteza ^LU

and Vihinen, Mauno ^LU

organization

publishing date

2025

type

Contribution to journal

publication status

published

subject

Medical Genetics and Genomics (including Gene Therapy)

in

International Journal of Molecular Sciences

volume

26

issue

5

article number

2004

publisher

MDPI AG

external identifiers

pmid:40076632
scopus:86000790431

ISSN

1422-0067

DOI

10.3390/ijms26052004

language

English

LU publication?

yes

id

3998a62c-2af8-4bd8-891f-008d1de51ec8

date added to LUP

2025-03-02 13:34:29

date last changed

2025-10-14 09:17:33

@article{3998a62c-2af8-4bd8-891f-008d1de51ec8,
  abstract     = {{Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign.}},
  author       = {{Kabir, Muhammad and Ahmed, Saeed and Zhang, Haoyang and Rodríguez-Rodríguez, Ignacio and Najibi, Seyed Morteza and Vihinen, Mauno}},
  issn         = {{1422-0067}},
  language     = {{eng}},
  number       = {{5}},
  publisher    = {{MDPI AG}},
  series       = {{International Journal of Molecular Sciences}},
  title        = {{PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions}},
  url          = {{http://dx.doi.org/10.3390/ijms26052004}},
  doi          = {{10.3390/ijms26052004}},
  volume       = {{26}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions