Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Exploring the Signature Matrix Creation Process for Cell Type Deconvolution Using Proteomics Data

Jakobsson, Hanna LU (2025) KIMM05 20251
Department of Immunotechnology
Abstract
Cell type deconvolution - the computational estimation of cellular composition in bulk tissue samples - is a valuable tool in cancer research for understanding immune infiltration and tissue heterogeneity. Most deconvolution algorithms infer the composition of cell types in bulk samples by modeling the observed expression data as a linear combination of reference expression profiles from each cell type present in a mixture. The reference expression profile, also called the signature matrix, is constructed from expression profiles from pure cell types. While current methods predominantly rely on transcriptomic data, proteomics-based deconvolution remains underexplored. This study explores how proteome-derived signature matrices can be... (More)
Cell type deconvolution - the computational estimation of cellular composition in bulk tissue samples - is a valuable tool in cancer research for understanding immune infiltration and tissue heterogeneity. Most deconvolution algorithms infer the composition of cell types in bulk samples by modeling the observed expression data as a linear combination of reference expression profiles from each cell type present in a mixture. The reference expression profile, also called the signature matrix, is constructed from expression profiles from pure cell types. While current methods predominantly rely on transcriptomic data, proteomics-based deconvolution remains underexplored. This study explores how proteome-derived signature matrices can be generated and refined, and how methodological choices affect signature composition and performance. Using a custom signature matrix generation function, imputation strategies and feature selection methods such as differential expression analysis (DEA), kappa optimization and various filtering approaches were evaluated. Several strategies yielded highly accurate deconvolution results with Pearson correlations between estimated and true proportions exceeding 0.95. In particular, filtering markers based on cell type specificity achieved notably good results. However, the observations also uncover a complex interplay between signature composition and deconvolution algorithm, suggesting that the most effective approach may depend on the characteristics of both the reference data and the target dataset. While no universal strategy for constructing signature matrices was established, this study offers guidance and suggests directions for future research, while also acknowledging limitations in current methodologies. This thesis represents a step toward achieving reliable and broadly applicable proteome-derived signature matrices and provides a foundation for further methodological development. (Less)
Popular Abstract
Using Protein Data to Decode the Immune Landscape of Tumors

Tumors are not just made of cancer cells but also contain immune cells that have prognostic implications for the disease. With emerging computational methods, these immune cell proportions can be estimated directly from protein data - without the need to assess each individual cell. This work explores how these estimates can be improved, with the end goal of enabling more precise and personalized cancer treatments in the future.

Cancer is one of the most common diseases worldwide, and its complexity makes it especially challenging to treat. A tumor is like a lively community: alongside cancer cells live a wide variety of immune cells, each with its own role. In some cases... (More)
Using Protein Data to Decode the Immune Landscape of Tumors

Tumors are not just made of cancer cells but also contain immune cells that have prognostic implications for the disease. With emerging computational methods, these immune cell proportions can be estimated directly from protein data - without the need to assess each individual cell. This work explores how these estimates can be improved, with the end goal of enabling more precise and personalized cancer treatments in the future.

Cancer is one of the most common diseases worldwide, and its complexity makes it especially challenging to treat. A tumor is like a lively community: alongside cancer cells live a wide variety of immune cells, each with its own role. In some cases the immune cells fight the cancer, but in other cases they unintentionally promote its growth. The type and abundance of immune cells in a tumor provide important information about how the cancer may progress and how well a patient might respond to treatment. The ability to assess immune cell composition in tumors could therefore pave the way for truly personalized cancer treatments.

In recent years, new computational methods have been developed to estimate which immune cells are present in tumor samples, based on biological data such as proteins. Each cell type contains its own distinct set of proteins, and with modern analytical techniques we can measure the levels of many proteins simultaneously. The process of computationally estimating the proportions of different cell types is known as cell type deconvolution. To understand the concept, imagine a symphony orchestra: from the audience you hear the whole orchestra, but it is difficult to distinguish individual instruments. Deconvolution is like separating the music into its components, allowing us to tell that the orchestra contains, for example, 20 violins, 10 trumpets, and 5 flutes. To do this, the algorithm needs a reference: a signature matrix. This functions as a catalog of what each instrument sounds like on its own, so that they can be recognized even when played in an orchestra. In biology, the sounds from each instrument are genes or proteins, and the unique combination for each cell type acts as a molecular “fingerprint”.

This thesis explores how the process of building these matrices can be improved for protein data - or, returning to the orchestra metaphor, how to choose the clearest and most characteristic sounds for each instrument, so that they can be recognized even in the most complex compositions. The aim was to understand what makes a good signature matrix and how it can be optimized to more accurately estimate the immune cell content in cancer tumors. To investigate this, different signature matrices were created using various combinations of proteins and applied to data with known immune cell proportions. The algorithm’s estimates were then compared to the true values, and correlation scores were calculated to assess accuracy.

Results suggest that there is no single optimal way to build a signature matrix. However, some methods proved particularly successful, such as selecting proteins highly specific to each immune cell, which resulted in almost perfect modelling. Although challenges remain, this work offers valuable insights into how protein-based signature matrices can be refined and developed further. In the long run, such improvements could help us capture a more accurate picture of the immune system’s role in cancer and bring us closer to more precise, personalized treatments. (Less)
Please use this url to cite or link to this publication:
author
Jakobsson, Hanna LU
supervisor
organization
course
KIMM05 20251
year
type
H2 - Master's Degree (Two Years)
subject
keywords
cell type deconvolution, proteomics, immune infiltration, signature matrix
language
English
id
9212947
date added to LUP
2025-09-23 12:53:08
date last changed
2025-09-23 12:53:08
@misc{9212947,
  abstract     = {{Cell type deconvolution - the computational estimation of cellular composition in bulk tissue samples - is a valuable tool in cancer research for understanding immune infiltration and tissue heterogeneity. Most deconvolution algorithms infer the composition of cell types in bulk samples by modeling the observed expression data as a linear combination of reference expression profiles from each cell type present in a mixture. The reference expression profile, also called the signature matrix, is constructed from expression profiles from pure cell types. While current methods predominantly rely on transcriptomic data, proteomics-based deconvolution remains underexplored. This study explores how proteome-derived signature matrices can be generated and refined, and how methodological choices affect signature composition and performance. Using a custom signature matrix generation function, imputation strategies and feature selection methods such as differential expression analysis (DEA), kappa optimization and various filtering approaches were evaluated. Several strategies yielded highly accurate deconvolution results with Pearson correlations between estimated and true proportions exceeding 0.95. In particular, filtering markers based on cell type specificity achieved notably good results. However, the observations also uncover a complex interplay between signature composition and deconvolution algorithm, suggesting that the most effective approach may depend on the characteristics of both the reference data and the target dataset. While no universal strategy for constructing signature matrices was established, this study offers guidance and suggests directions for future research, while also acknowledging limitations in current methodologies. This thesis represents a step toward achieving reliable and broadly applicable proteome-derived signature matrices and provides a foundation for further methodological development.}},
  author       = {{Jakobsson, Hanna}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Exploring the Signature Matrix Creation Process for Cell Type Deconvolution Using Proteomics Data}},
  year         = {{2025}},
}