Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease
(2025) In Lund University, Faculty of Medicine Doctoral Dissertation Series- Abstract
Infectious diseases are one of the leading
causes of mortality in the world. Severe infections can manifest in many ways,
creating a heterogeneous clinical and molecular disease landscape that renders
these diseases difficult to research, diagnose, and treat. To investigate the
molecular mechanisms of infectious disease, we apply mass spectrometry-based
proteomics to analyze blood plasma samples for the dynamic stratification of infectious
disease and sepsis patients. In this thesis, we focus on the development of
computational methods that facilitate the interrogation of these complex
proteomes towards the goal of translational medicine and personalized care.
(More) Infectious diseases are one of the leading
causes of mortality in the world. Severe infections can manifest in many ways,
creating a heterogeneous clinical and molecular disease landscape that renders
these diseases difficult to research, diagnose, and treat. To investigate the
molecular mechanisms of infectious disease, we apply mass spectrometry-based
proteomics to analyze blood plasma samples for the dynamic stratification of infectious
disease and sepsis patients. In this thesis, we focus on the development of
computational methods that facilitate the interrogation of these complex
proteomes towards the goal of translational medicine and personalized care.
The overall goal of this thesis was to
enable the in-depth analysis of large-scale clinical proteomic cohorts. As a
first step, we leveraged computational methods to facilitate discovery
data-independent acquisition (DIA) mass spectrometry (MS) and maximize the
number of identified proteins in plasma samples. Using large-scale machine
learning methods, we optimize the search space using a multi-pass prediction-based
filtration step that allows for robust control of the false discovery rate
(FDR) while optimizing the number of quantified proteins. From here, we
introduce explainable machine learning methods to select the most important
proteins involved in predicting severe disease. We substantially expand these
explainable machine learning methods, formalizing them into easy-to-use
software packages that support reproducible research and in-depth proteomic
analysis. Finally, we combine our novel computational methods to analyze 1400 clinical
plasma samples from patients suspected of sepsis. Using samples taken at the
time-of-admission to the hospital, we developed an inherently interpretable
architecture to match new patients to similar groups of existing patients from
a database to create digital families. These digital families could accurately
stratify patients suspected of sepsis, predict disease trajectories, predict mortality,
and identify hidden cohorts within the data.
In combination, the results contained within
(Less)
this thesis provide a strong basis for further studies and movement towards
personalized health care for infectious diseases.
- author
- Scott, Aaron LU
- supervisor
- opponent
-
- Professor Käll, Lukas, Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden
- organization
- publishing date
- 2025
- type
- Thesis
- publication status
- published
- subject
- keywords
- proteomics, computational proteomics, bioinformatics, machine learning, explainable machine learning, software engineering, algorithm development, infection, sepsis, mass spectrometry
- in
- Lund University, Faculty of Medicine Doctoral Dissertation Series
- issue
- 2025:3
- pages
- 85 pages
- publisher
- Lund University, Faculty of Medicine
- defense location
- Belfragesalen, BMC D15, Klinikgatan 32 i Lund. Join by Zoom: https://lu-se.zoom.us/j/67161629214?pwd=mCNVkz882Fz4SaSvZ7kw6UeMBiB7mY.1
- defense date
- 2025-01-10 09:00:00
- ISSN
- 1652-8220
- ISBN
- 978-91-8021-656-2
- language
- English
- LU publication?
- yes
- id
- 0dc1272b-643e-42a9-a807-25bebf3c5e92
- date added to LUP
- 2024-12-17 09:34:25
- date last changed
- 2025-04-04 14:01:52
@phdthesis{0dc1272b-643e-42a9-a807-25bebf3c5e92, abstract = {{<p class="bodytext">Infectious diseases are one of the leading<br> causes of mortality in the world. Severe infections can manifest in many ways,<br> creating a heterogeneous clinical and molecular disease landscape that renders<br> these diseases difficult to research, diagnose, and treat. To investigate the<br> molecular mechanisms of infectious disease, we apply mass spectrometry-based<br> proteomics to analyze blood plasma samples for the dynamic stratification of infectious<br> disease and sepsis patients. In this thesis, we focus on the development of<br> computational methods that facilitate the interrogation of these complex<br> proteomes towards the goal of translational medicine and personalized care.</p><br> <br> <p class="bodytext">The overall goal of this thesis was to<br> enable the in-depth analysis of large-scale clinical proteomic cohorts. As a<br> first step, we leveraged computational methods to facilitate discovery<br> data-independent acquisition (DIA) mass spectrometry (MS) and maximize the<br> number of identified proteins in plasma samples. Using large-scale machine<br> learning methods, we optimize the search space using a multi-pass prediction-based<br> filtration step that allows for robust control of the false discovery rate<br> (FDR) while optimizing the number of quantified proteins. From here, we<br> introduce explainable machine learning methods to select the most important<br> proteins involved in predicting severe disease. We substantially expand these<br> explainable machine learning methods, formalizing them into easy-to-use<br> software packages that support reproducible research and in-depth proteomic<br> analysis. Finally, we combine our novel computational methods to analyze 1400 clinical<br> plasma samples from patients suspected of sepsis. Using samples taken at the<br> time-of-admission to the hospital, we developed an inherently interpretable<br> architecture to match new patients to similar groups of existing patients from<br> a database to create digital families. These digital families could accurately<br> stratify patients suspected of sepsis, predict disease trajectories, predict mortality,<br> and identify hidden cohorts within the data.</p><br> <br> <p class="bodytext">In combination, the results contained within<br> this thesis provide a strong basis for further studies and movement towards<br> personalized health care for infectious diseases. </p>}}, author = {{Scott, Aaron}}, isbn = {{978-91-8021-656-2}}, issn = {{1652-8220}}, keywords = {{proteomics; computational proteomics; bioinformatics; machine learning; explainable machine learning; software engineering; algorithm development; infection; sepsis; mass spectrometry}}, language = {{eng}}, number = {{2025:3}}, publisher = {{Lund University, Faculty of Medicine}}, school = {{Lund University}}, series = {{Lund University, Faculty of Medicine Doctoral Dissertation Series}}, title = {{Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease}}, url = {{https://lup.lub.lu.se/search/files/202499952/Aaron_M_Scott_-_WEBB.pdf}}, year = {{2025}}, }