Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data

Dhapola, Parashar LU ; Rodhe, Johan LU ; Olofzon, Rasmus LU orcid ; Bonald, Thomas ; Erlandsson, Eva LU ; Soneji, Shamit LU and Karlsson, Göran LU (2022) In Nature Communications 13.
Abstract

As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored... (More)

As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf .

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Algorithms, Cluster Analysis, Genomics, Single-Cell Analysis, Software, Whole Exome Sequencing
in
Nature Communications
volume
13
article number
4616
publisher
Nature Publishing Group
external identifiers
  • scopus:85135552378
  • pmid:35941103
ISSN
2041-1723
DOI
10.1038/s41467-022-32097-3
language
English
LU publication?
yes
additional info
© 2022. The Author(s).
id
b15fcda9-7fd8-4a22-9d0d-99c50180e936
date added to LUP
2022-08-15 09:59:15
date last changed
2024-03-21 12:45:22
@article{b15fcda9-7fd8-4a22-9d0d-99c50180e936,
  abstract     = {{<p>As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf .</p>}},
  author       = {{Dhapola, Parashar and Rodhe, Johan and Olofzon, Rasmus and Bonald, Thomas and Erlandsson, Eva and Soneji, Shamit and Karlsson, Göran}},
  issn         = {{2041-1723}},
  keywords     = {{Algorithms; Cluster Analysis; Genomics; Single-Cell Analysis; Software; Whole Exome Sequencing}},
  language     = {{eng}},
  publisher    = {{Nature Publishing Group}},
  series       = {{Nature Communications}},
  title        = {{Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data}},
  url          = {{http://dx.doi.org/10.1038/s41467-022-32097-3}},
  doi          = {{10.1038/s41467-022-32097-3}},
  volume       = {{13}},
  year         = {{2022}},
}