Predicting Antibody Developability: Machine Learning Meets Therapeutic Antibodies

Höjding, Josephine; Björkhem, William

Predicting Antibody Developability: Machine Learning Meets Therapeutic Antibodies

Mark

Höjding, Josephine ^LU and Björkhem, William ^LU (2025) In Master’s Theses in Mathematical Sciences FMAM05 20251
Mathematics (Faculty of Engineering)

Abstract: Antibody developability refers to an antibody’s suitability for clinical use, including properties such as solubility, stability, and aggregation. These traits are traditionally assessed through experimental screening, which is time-consuming and resource heavy. Machine learning offers a promising alternative for early prediction of developability, though many existing models are still in early stages.

This work compares multiple machine learning strategies for predicting protein solubility, a key developability factor. Five datasets were used: four consisting of non-antibody protein sequences expressed in E. Coli with solubility labels, and one independent antibody dataset without labels. Three existing models—NetSolP, SWI, and... (More); Antibody developability refers to an antibody’s suitability for clinical use, including properties such as solubility, stability, and aggregation. These traits are traditionally assessed through experimental screening, which is time-consuming and resource heavy. Machine learning offers a promising alternative for early prediction of developability, though many existing models are still in early stages.

This work compares multiple machine learning strategies for predicting protein solubility, a key developability factor. Five datasets were used: four consisting of non-antibody protein sequences expressed in E. Coli with solubility labels, and one independent antibody dataset without labels. Three existing models—NetSolP, SWI, and ProteinSol—were evaluated using standard performance metrics, and new models were developed by leveraging feature extraction from SWI and ProteinSol to explore potential improvements.

Developed approaches included logistic regression for direct solubility prediction, models that first classified a sample’s likely dataset of origin before applying a corresponding solubility model, clustering-based methods with cluster-specific classifiers, and multi-layer perceptrons to test the benefits of deeper architectures.

Overall, the models achieved similar performance, with no single approach consistently outperforming others. Simpler models like logistic regression often performed on par with more complex models such as multi-layer perceptrons. Results varied by dataset, with the lowest performance observed on the largest and most diverse dataset, PDBSol, suggesting that high variability in sequence data may reduce prediction reliability. (Less)
Popular Abstract: What if we could fast-track the development of life-saving medicines, cutting down the time and cost required to bring them to patients? Antibodies are proteins found naturally in the body that help fight off disease and scientists have learned how to turn them into powerful medicines. These therapeutic antibodies have been used to treat cancer, autoimmune conditions, and even COVID-19. However, before any antibody can be developed into a medicine, it has to pass a series of tests to make sure it dissolves well, stays stable, and doesn't clump or break down. These tests are expensive and time-consuming.

In this thesis, we explored whether artificial intelligence (AI) could help predict a key trait of antibodies: solubility, which... (More); What if we could fast-track the development of life-saving medicines, cutting down the time and cost required to bring them to patients? Antibodies are proteins found naturally in the body that help fight off disease and scientists have learned how to turn them into powerful medicines. These therapeutic antibodies have been used to treat cancer, autoimmune conditions, and even COVID-19. However, before any antibody can be developed into a medicine, it has to pass a series of tests to make sure it dissolves well, stays stable, and doesn't clump or break down. These tests are expensive and time-consuming.

In this thesis, we explored whether artificial intelligence (AI) could help predict a key trait of antibodies: solubility, which affects how suitable an antibody is to be used as a drug. We tested existing tools and built new models using five datasets, trying both simple methods like logistic regression and more advanced ones like neural networks.

The surprising result? Simple models performed similarly to the complex ones. No single method stood out across all datasets, and the most challenging results came from the largest and most diverse dataset. This shows that the type and quality of data have a big impact on how well AI models perform.

While the models aren't perfect yet, this work highlights how AI could help scientists sort through large numbers of antibody candidates much faster, which can save time, money, and potentially speed up the development of new treatments. (Less)

- Open Access
- |
- PDF

Links

Document download statistics

Related Materials

Related object is popular science:
Popular abstract

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9195501

author

Höjding, Josephine ^LU and Björkhem, William ^LU

supervisor

Mikael Nilsson ^LU
Morten Krogh

organization

Mathematics (Faculty of Engineering)

course

FMAM05 20251

year

2025

type

H2 - Master's Degree (Two Years)

subject

keywords

machine learning, deep learning, antibody, antibody developability

publication/series

Master’s Theses in Mathematical Sciences

report number

LUTFMA-3582-2025

ISSN

1404-6342

other publication id

2025:E33

language

English

id

9195501

date added to LUP

2025-06-19 09:54:28

date last changed

2025-06-19 09:54:28

@misc{9195501,
  abstract     = {{Antibody developability refers to an antibody’s suitability for clinical use, including properties such as solubility, stability, and aggregation. These traits are traditionally assessed through experimental screening, which is time-consuming and resource heavy. Machine learning offers a promising alternative for early prediction of developability, though many existing models are still in early stages.

This work compares multiple machine learning strategies for predicting protein solubility, a key developability factor. Five datasets were used: four consisting of non-antibody protein sequences expressed in E. Coli with solubility labels, and one independent antibody dataset without labels. Three existing models—NetSolP, SWI, and ProteinSol—were evaluated using standard performance metrics, and new models were developed by leveraging feature extraction from SWI and ProteinSol to explore potential improvements.

Developed approaches included logistic regression for direct solubility prediction, models that first classified a sample’s likely dataset of origin before applying a corresponding solubility model, clustering-based methods with cluster-specific classifiers, and multi-layer perceptrons to test the benefits of deeper architectures.

Overall, the models achieved similar performance, with no single approach consistently outperforming others. Simpler models like logistic regression often performed on par with more complex models such as multi-layer perceptrons. Results varied by dataset, with the lowest performance observed on the largest and most diverse dataset, PDBSol, suggesting that high variability in sequence data may reduce prediction reliability.}},
  author       = {{Höjding, Josephine and Björkhem, William}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Predicting Antibody Developability: Machine Learning Meets Therapeutic Antibodies}},
  year         = {{2025}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Predicting Antibody Developability: Machine Learning Meets Therapeutic Antibodies