Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Computational Single-Cell Genomics Methods for Cell State Estimation and their Application to Hematopoietic Systems

Dhapola, Parashar LU (2022) In Lund University, Faculty of Medicine Doctoral Dissertation Series
Abstract
The focus of this doctoral dissertation was to develop novel and scalable computational genomics methods for analysing single-cell genomic modalities. Scarf, a highly memory-efficient single-cell data analysis toolkit, was developed to enable an analysis of very large-scale datasets on personal computers, and the execution of parallel workflows on server-scale computational systems. The advances in data chunking algorithms were leveraged, and several optimisations in current workflows were established to reduce memory usage across multiple analysis steps dramatically. A systematic benchmarking across atlas-scale single-cell RNA-Seq and ATAC-Seq datasets was performed to validate Scarf's robust memory efficiency, even under the most... (More)
The focus of this doctoral dissertation was to develop novel and scalable computational genomics methods for analysing single-cell genomic modalities. Scarf, a highly memory-efficient single-cell data analysis toolkit, was developed to enable an analysis of very large-scale datasets on personal computers, and the execution of parallel workflows on server-scale computational systems. The advances in data chunking algorithms were leveraged, and several optimisations in current workflows were established to reduce memory usage across multiple analysis steps dramatically. A systematic benchmarking across atlas-scale single-cell RNA-Seq and ATAC-Seq datasets was performed to validate Scarf's robust memory efficiency, even under the most demanding parameter sets. Within Scarf's framework, a graph-based hierarchical clustering algorithm was introduced that reveals complex nested cellular hierarchies at a single-cell resolution. Scarf contains a data subsampling algorithm for retaining rare cells and preserving cell state change trajectories. This topology-assisted cell downsampling algorithm (TopACeDo) was benchmarked on atlas-scale datasets and found superior to existing solutions on multiple metrics.
Single-cell genomics has opened new avenues for investigating the heterogeneity of complex cellular populations. Human CD34+ hematopoietic cells are known to harbour stem cells and lineage progenitors responsible for populating the entire blood tissue. A combination of the surface proteome and transcriptome data revealed that cord blood-derived CD34+ hematopoietic cells have a higher proportion of multipotent progenitors than the bone marrow-derived population. Further, the CD38-CD35+ immunophenotype was identified as marking the most primitive stem cells in the bone marrow-derived population. Chromatin accessibility profiles were generated for CD34+, CD34+CD38- and CD34+CD38-CD35+ cells using scATAC-Seq. A comparison of these profiles showed that CD35+ immunophenotype marks a stable epigenomic cell state corresponding to primitive hemopoietic stem cell identity. A unique enhancer signature was derived for these primitive cells, which was found to be enriched in NFAT and STAT family transcription factor motifs.
The mouse hematopoietic stem and progenitor cell (HSPC) population is, immunophenotypically, one of the most well-characterised cell systems. However, the underlying chromatin basis of multiple intermediate cell states in the early differentiation steps is unknown. By applying scATAC-Seq on multiple sorted HSPC populations, we uncovered regulatory elements associated with differentiation stages, linage specification and loss of multipotentiality. Furthermore, focussing on the FLT3 intermediate population (FLT3int), an understudied population within the HSPC pool, it was found that FLT3intCD9+ cells retained multilineage potential while FLT3intCD9- cells did not.
Single-cell genomics has the potential to reveal cell-of-origin in case of neoplastic cell transformation. The existing computational approaches treat the transcriptomic shifts in the neoplasm as batch effects to reveal the healthy equivalent cells that could be cells of origin. However, this can lead to misleading results in conditions where technical batch effects confound the transcriptomic shifts. To solve this, we developed a computational method called Nabo that uses a projection-based approach such that the cell populations suspected of containing neoplastic cells can be interpreted within the heterogeneity context of a reference healthy cell population. Nabo's dynamic feature rejection algorithm considers only the relevant features for comparison. In murine HSPC hierarchy, primitive HSCs are known to be insensitive to MLL-ENL-induced leukemic transformation; however, GMLPs are sensitive to such transformation. Comparison of single-cell RNA-Seq profiles of wild type GMLPs and MLL-ENL-induced GMLPs using Nabo showed that the leukemic cells reflected the transcriptomic state of most primitive GMLPs. Projection of both the populations onto cKit+ HSPCs showed Nabo's ability to identify the cells of origin in a leukemic context and indicate how MLL-ENL induction leads to differentiation arrests in GMLP cells.
(Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • professor Khodosevich, Konstantin, University of Copenhagen
organization
publishing date
type
Thesis
publication status
published
subject
keywords
Hematopoietic Stem Cell, Single-Cell, Genomics, Epigenomics, transcriptomics, bioinformatics
in
Lund University, Faculty of Medicine Doctoral Dissertation Series
issue
2022:167
pages
91 pages
publisher
Lund University, Faculty of Medicine
defense location
Belfragesalen, BMC D15, Klinikgatan 32 i Lund. Join by Zoom: https://lu-se.zoom.us/j/63682605021?pwd=dkN1WjdLekorM0oxU0FTME90Sm94QT09
defense date
2022-12-01 09:00:00
ISSN
1652-8220
ISBN
978-91-8021-329-5
language
English
LU publication?
yes
id
b063785a-e289-4dd2-bac1-215f1f809b15
date added to LUP
2022-10-10 14:26:28
date last changed
2022-11-09 09:48:16
@phdthesis{b063785a-e289-4dd2-bac1-215f1f809b15,
  abstract     = {{The focus of this doctoral dissertation was to develop novel and scalable computational genomics methods for analysing single-cell genomic modalities. Scarf, a highly memory-efficient single-cell data analysis toolkit, was developed to enable an analysis of very large-scale datasets on personal computers, and the execution of parallel workflows on server-scale computational systems. The advances in data chunking algorithms were leveraged, and several optimisations in current workflows were established to reduce memory usage across multiple analysis steps dramatically. A systematic benchmarking across atlas-scale single-cell RNA-Seq and ATAC-Seq datasets was performed to validate Scarf's robust memory efficiency, even under the most demanding parameter sets. Within Scarf's framework, a graph-based hierarchical clustering algorithm was introduced that reveals complex nested cellular hierarchies at a single-cell resolution. Scarf contains a data subsampling algorithm for retaining rare cells and preserving cell state change trajectories. This topology-assisted cell downsampling algorithm (TopACeDo) was benchmarked on atlas-scale datasets and found superior to existing solutions on multiple metrics.<br/>Single-cell genomics has opened new avenues for investigating the heterogeneity of complex cellular populations. Human CD34+ hematopoietic cells are known to harbour stem cells and lineage progenitors responsible for populating the entire blood tissue. A combination of the surface proteome and transcriptome data revealed that cord blood-derived CD34+ hematopoietic cells have a higher proportion of multipotent progenitors than the bone marrow-derived population. Further, the CD38-CD35+ immunophenotype was identified as marking the most primitive stem cells in the bone marrow-derived population. Chromatin accessibility profiles were generated for CD34+, CD34+CD38- and CD34+CD38-CD35+ cells using scATAC-Seq. A comparison of these profiles showed that CD35+ immunophenotype marks a stable epigenomic cell state corresponding to primitive hemopoietic stem cell identity. A unique enhancer signature was derived for these primitive cells, which was found to be enriched in NFAT and STAT family transcription factor motifs. <br/>The mouse hematopoietic stem and progenitor cell (HSPC) population is, immunophenotypically, one of the most well-characterised cell systems. However, the underlying chromatin basis of multiple intermediate cell states in the early differentiation steps is unknown. By applying scATAC-Seq on multiple sorted HSPC populations, we uncovered regulatory elements associated with differentiation stages, linage specification and loss of multipotentiality. Furthermore, focussing on the FLT3 intermediate population (FLT3int), an understudied population within the HSPC pool, it was found that FLT3intCD9+ cells retained multilineage potential while FLT3intCD9- cells did not.<br/>Single-cell genomics has the potential to reveal cell-of-origin in case of neoplastic cell transformation. The existing computational approaches treat the transcriptomic shifts in the neoplasm as batch effects to reveal the healthy equivalent cells that could be cells of origin. However, this can lead to misleading results in conditions where technical batch effects confound the transcriptomic shifts. To solve this, we developed a computational method called Nabo that uses a projection-based approach such that the cell populations suspected of containing neoplastic cells can be interpreted within the heterogeneity context of a reference healthy cell population. Nabo's dynamic feature rejection algorithm considers only the relevant features for comparison. In murine HSPC hierarchy, primitive HSCs are known to be insensitive to MLL-ENL-induced leukemic transformation; however, GMLPs are sensitive to such transformation. Comparison of single-cell RNA-Seq profiles of wild type GMLPs and MLL-ENL-induced GMLPs using Nabo showed that the leukemic cells reflected the transcriptomic state of most primitive GMLPs. Projection of both the populations onto cKit+ HSPCs showed Nabo's ability to identify the cells of origin in a leukemic context and indicate how MLL-ENL induction leads to differentiation arrests in GMLP cells.<br/>}},
  author       = {{Dhapola, Parashar}},
  isbn         = {{978-91-8021-329-5}},
  issn         = {{1652-8220}},
  keywords     = {{Hematopoietic Stem Cell; Single-Cell; Genomics; Epigenomics; transcriptomics; bioinformatics}},
  language     = {{eng}},
  number       = {{2022:167}},
  publisher    = {{Lund University, Faculty of Medicine}},
  school       = {{Lund University}},
  series       = {{Lund University, Faculty of Medicine Doctoral Dissertation Series}},
  title        = {{Computational Single-Cell Genomics Methods for Cell State Estimation and their Application to Hematopoietic Systems}},
  year         = {{2022}},
}