PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions
(2025) In International Journal of Molecular Sciences 26(5).- Abstract
- Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for... (More)
- Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/3998a62c-2af8-4bd8-891f-008d1de51ec8
- author
- Kabir, Muhammad
LU
; Ahmed, Saeed
LU
; Zhang, Haoyang
LU
; Rodríguez-Rodríguez, Ignacio LU ; Najibi, Seyed Morteza LU
and Vihinen, Mauno LU
- organization
- publishing date
- 2025
- type
- Contribution to journal
- publication status
- published
- subject
- in
- International Journal of Molecular Sciences
- volume
- 26
- issue
- 5
- article number
- 2004
- publisher
- MDPI AG
- external identifiers
-
- pmid:40076632
- scopus:86000790431
- ISSN
- 1422-0067
- DOI
- 10.3390/ijms26052004
- language
- English
- LU publication?
- yes
- id
- 3998a62c-2af8-4bd8-891f-008d1de51ec8
- date added to LUP
- 2025-03-02 13:34:29
- date last changed
- 2025-06-09 04:01:14
@article{3998a62c-2af8-4bd8-891f-008d1de51ec8, abstract = {{Different types of information are combined during variation interpretation. Computational predictors, most often pathogenicity predictors, provide one type of information for this purpose. These tools are based on various kinds of algorithms. Although the American College of Genetics and the Association for Molecular Pathology guidelines classify variants into five categories, practically all pathogenicity predictors provide binary pathogenic/benign predictions. We developed a novel artificial intelligence-based tool, PON-P3, on the basis of a carefully selected training dataset, meticulous feature selection, and optimization. We started with 1526 features describing variations, their sequence and structural context, and parameters for the affected genes and proteins. The final random boosting method was tested and compared with a total of 23 predictors. PON-P3 performed better than recently introduced predictors, which utilize large language models or structural predictions. PON-P3 was better than methods that use evolutionary data alone or in combination with different gene and protein properties. PON-P3 classifies cases into three categories as benign, pathogenic, and variants of uncertain significance (VUSs). When binary test data were used, some metapredictors performed slightly better than PON-P3; however, in real-life situations, with patient data, those methods overpredict both pathogenic and benign cases. We predicted with PON-P3 all possible amino acid substitutions in all human proteins encoded from MANE transcripts. The method was also used to predict all unambiguous VUSs (i.e., without conflicts) in ClinVar. A total of 12.9% were predicted to be pathogenic, and 49.9% were benign.}}, author = {{Kabir, Muhammad and Ahmed, Saeed and Zhang, Haoyang and Rodríguez-Rodríguez, Ignacio and Najibi, Seyed Morteza and Vihinen, Mauno}}, issn = {{1422-0067}}, language = {{eng}}, number = {{5}}, publisher = {{MDPI AG}}, series = {{International Journal of Molecular Sciences}}, title = {{PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions}}, url = {{http://dx.doi.org/10.3390/ijms26052004}}, doi = {{10.3390/ijms26052004}}, volume = {{26}}, year = {{2025}}, }