Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease

Scott, Aaron

Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease

Mark

Scott, Aaron ^LU (2025) In Lund University, Faculty of Medicine Doctoral Dissertation Series

Abstract

Infectious diseases are one of the leading
causes of mortality in the world. Severe infections can manifest in many ways,
creating a heterogeneous clinical and molecular disease landscape that renders
these diseases difficult to research, diagnose, and treat. To investigate the
molecular mechanisms of infectious disease, we apply mass spectrometry-based
proteomics to analyze blood plasma samples for the dynamic stratification of infectious
disease and sepsis patients. In this thesis, we focus on the development of
computational methods that facilitate the interrogation of these complex
proteomes towards the goal of translational medicine and personalized care.

(More)

Infectious diseases are one of the leading
causes of mortality in the world. Severe infections can manifest in many ways,
creating a heterogeneous clinical and molecular disease landscape that renders
these diseases difficult to research, diagnose, and treat. To investigate the
molecular mechanisms of infectious disease, we apply mass spectrometry-based
proteomics to analyze blood plasma samples for the dynamic stratification of infectious
disease and sepsis patients. In this thesis, we focus on the development of
computational methods that facilitate the interrogation of these complex
proteomes towards the goal of translational medicine and personalized care.

The overall goal of this thesis was to
enable the in-depth analysis of large-scale clinical proteomic cohorts. As a
first step, we leveraged computational methods to facilitate discovery
data-independent acquisition (DIA) mass spectrometry (MS) and maximize the
number of identified proteins in plasma samples. Using large-scale machine
learning methods, we optimize the search space using a multi-pass prediction-based
filtration step that allows for robust control of the false discovery rate
(FDR) while optimizing the number of quantified proteins. From here, we
introduce explainable machine learning methods to select the most important
proteins involved in predicting severe disease. We substantially expand these
explainable machine learning methods, formalizing them into easy-to-use
software packages that support reproducible research and in-depth proteomic
analysis. Finally, we combine our novel computational methods to analyze 1400 clinical
plasma samples from patients suspected of sepsis. Using samples taken at the
time-of-admission to the hospital, we developed an inherently interpretable
architecture to match new patients to similar groups of existing patients from
a database to create digital families. These digital families could accurately
stratify patients suspected of sepsis, predict disease trajectories, predict mortality,
and identify hidden cohorts within the data.

In combination, the results contained within
this thesis provide a strong basis for further studies and movement towards
personalized health care for infectious diseases.

(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/0dc1272b-643e-42a9-a807-25bebf3c5e92

author

Scott, Aaron ^LU

supervisor

Lars Malmstroem ^LU
Christofer Karlsson ^LU

opponent

Professor Käll, Lukas, Science for Life Laboratory, KTH – Royal Institute of Technology, Stockholm, Sweden

organization

Infection Medicine (BMC)

publishing date

2025

type

Thesis

publication status

published

subject

keywords

proteomics, computational proteomics, bioinformatics, machine learning, explainable machine learning, software engineering, algorithm development, infection, sepsis, mass spectrometry

in

Lund University, Faculty of Medicine Doctoral Dissertation Series

issue

2025:3

pages

85 pages

publisher

Lund University, Faculty of Medicine

defense location

Belfragesalen, BMC D15, Klinikgatan 32 i Lund. Join by Zoom: https://lu-se.zoom.us/j/67161629214?pwd=mCNVkz882Fz4SaSvZ7kw6UeMBiB7mY.1

defense date

2025-01-10 09:00:00

ISSN

1652-8220

ISBN

978-91-8021-656-2

language

English

LU publication?

yes

id

0dc1272b-643e-42a9-a807-25bebf3c5e92

date added to LUP

2024-12-17 09:34:25

date last changed

2025-10-21 13:06:33

@phdthesis{0dc1272b-643e-42a9-a807-25bebf3c5e92,
  abstract     = {{<p class="bodytext">Infectious diseases are one of the leading<br>
causes of mortality in the world. Severe infections can manifest in many ways,<br>
creating a heterogeneous clinical and molecular disease landscape that renders<br>
these diseases difficult to research, diagnose, and treat. To investigate the<br>
molecular mechanisms of infectious disease, we apply mass spectrometry-based<br>
proteomics to analyze blood plasma samples for the dynamic stratification of infectious<br>
disease and sepsis patients. In this thesis, we focus on the development of<br>
computational methods that facilitate the interrogation of these complex<br>
proteomes towards the goal of translational medicine and personalized care.</p><br>
<br>
<p class="bodytext">The overall goal of this thesis was to<br>
enable the in-depth analysis of large-scale clinical proteomic cohorts. As a<br>
first step, we leveraged computational methods to facilitate discovery<br>
data-independent acquisition (DIA) mass spectrometry (MS) and maximize the<br>
number of identified proteins in plasma samples. Using large-scale machine<br>
learning methods, we optimize the search space using a multi-pass prediction-based<br>
filtration step that allows for robust control of the false discovery rate<br>
(FDR) while optimizing the number of quantified proteins. From here, we<br>
introduce explainable machine learning methods to select the most important<br>
proteins involved in predicting severe disease. We substantially expand these<br>
explainable machine learning methods, formalizing them into easy-to-use<br>
software packages that support reproducible research and in-depth proteomic<br>
analysis. Finally, we combine our novel computational methods to analyze 1400 clinical<br>
plasma samples from patients suspected of sepsis. Using samples taken at the<br>
time-of-admission to the hospital, we developed an inherently interpretable<br>
architecture to match new patients to similar groups of existing patients from<br>
a database to create digital families. These digital families could accurately<br>
stratify patients suspected of sepsis, predict disease trajectories, predict mortality,<br>
and identify hidden cohorts within the data.</p><br>
<br>
<p class="bodytext">In combination, the results contained within<br>
this thesis provide a strong basis for further studies and movement towards<br>
personalized health care for infectious diseases. </p>}},
  author       = {{Scott, Aaron}},
  isbn         = {{978-91-8021-656-2}},
  issn         = {{1652-8220}},
  keywords     = {{proteomics; computational proteomics; bioinformatics; machine learning; explainable machine learning; software engineering; algorithm development; infection; sepsis; mass spectrometry}},
  language     = {{eng}},
  number       = {{2025:3}},
  publisher    = {{Lund University, Faculty of Medicine}},
  school       = {{Lund University}},
  series       = {{Lund University, Faculty of Medicine Doctoral Dissertation Series}},
  title        = {{Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease}},
  url          = {{https://lup.lub.lu.se/search/files/202499952/Aaron_M_Scott_-_WEBB.pdf}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Computational methods for the analysis of clinical proteomics data - Deciphering the hidden biology of infectious disease