Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Machine learning evaluation for identification of M-proteins in human serum

Sopasakis, Alexandros LU ; Nilsson, Maria ; Askenmo, Mattias ; Nyholm, Fredrik ; Mattsson Hultén, Lillemor and Rotter Sopasakis, Victoria (2024) In PLoS ONE 19(4). p.0299600-0299600
Abstract

Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data,... (More)

Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Humans, Reproducibility of Results, Antibodies, Multiple Myeloma/diagnosis, Electrophoresis, Capillary, Algorithms, Immunoglobulin Isotypes, Machine Learning
in
PLoS ONE
volume
19
issue
4
pages
0299600 - 0299600
publisher
Public Library of Science (PLoS)
external identifiers
  • scopus:85189534004
  • pmid:38564628
ISSN
1932-6203
DOI
10.1371/journal.pone.0299600
language
English
LU publication?
yes
additional info
Copyright: © 2024 Sopasakis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
id
95c0718f-422e-49f8-be6b-6a507dfea6eb
date added to LUP
2024-04-11 06:59:11
date last changed
2024-05-24 10:17:11
@article{95c0718f-422e-49f8-be6b-6a507dfea6eb,
  abstract     = {{<p>Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.</p>}},
  author       = {{Sopasakis, Alexandros and Nilsson, Maria and Askenmo, Mattias and Nyholm, Fredrik and Mattsson Hultén, Lillemor and Rotter Sopasakis, Victoria}},
  issn         = {{1932-6203}},
  keywords     = {{Humans; Reproducibility of Results; Antibodies; Multiple Myeloma/diagnosis; Electrophoresis, Capillary; Algorithms; Immunoglobulin Isotypes; Machine Learning}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{0299600--0299600}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Machine learning evaluation for identification of M-proteins in human serum}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0299600}},
  doi          = {{10.1371/journal.pone.0299600}},
  volume       = {{19}},
  year         = {{2024}},
}