Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data
(2022) In Nature Communications 13.- Abstract
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored... (More)
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf .
(Less)
- author
- Dhapola, Parashar LU ; Rodhe, Johan LU ; Olofzon, Rasmus LU ; Bonald, Thomas ; Erlandsson, Eva LU ; Soneji, Shamit LU and Karlsson, Göran LU
- organization
-
- Stem Cells and Leukemia (research group)
- StemTherapy: National Initiative on Stem Cells for Regenerative Therapy
- eSSENCE: The e-Science Collaboration
- Stem Cell Center
- Developmental lymphopoiesis and leukemia (research group)
- Developmental Hematopoiesis (research group)
- Division of Molecular Hematology (DMH)
- publishing date
- 2022
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Algorithms, Cluster Analysis, Genomics, Single-Cell Analysis, Software, Whole Exome Sequencing
- in
- Nature Communications
- volume
- 13
- article number
- 4616
- publisher
- Nature Publishing Group
- external identifiers
-
- scopus:85135552378
- pmid:35941103
- ISSN
- 2041-1723
- DOI
- 10.1038/s41467-022-32097-3
- language
- English
- LU publication?
- yes
- additional info
- © 2022. The Author(s).
- id
- b15fcda9-7fd8-4a22-9d0d-99c50180e936
- date added to LUP
- 2022-08-15 09:59:15
- date last changed
- 2024-03-21 12:45:22
@article{b15fcda9-7fd8-4a22-9d0d-99c50180e936, abstract = {{<p>As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Herein we present Scarf, a modularly designed Python package that seamlessly interoperates with other single-cell toolkits and allows for memory-efficient single-cell analysis of millions of cells on a laptop or low-cost devices like single-board computers. We demonstrate Scarf's memory and compute-time efficiency by applying it to the largest existing single-cell RNA-Seq and ATAC-Seq datasets. Scarf wraps memory-efficient implementations of a graph-based t-stochastic neighbour embedding and hierarchical clustering algorithm. Moreover, Scarf performs accurate reference-anchored mapping of datasets while maintaining memory efficiency. By implementing a subsampling algorithm, Scarf additionally has the capacity to generate representative sampling of cells from a given dataset wherein rare cell populations and lineage differentiation trajectories are conserved. Together, Scarf provides a framework wherein any researcher can perform advanced processing, subsampling, reanalysis, and integration of atlas-scale datasets on standard laptop computers. Scarf is available on Github: https://github.com/parashardhapola/scarf .</p>}}, author = {{Dhapola, Parashar and Rodhe, Johan and Olofzon, Rasmus and Bonald, Thomas and Erlandsson, Eva and Soneji, Shamit and Karlsson, Göran}}, issn = {{2041-1723}}, keywords = {{Algorithms; Cluster Analysis; Genomics; Single-Cell Analysis; Software; Whole Exome Sequencing}}, language = {{eng}}, publisher = {{Nature Publishing Group}}, series = {{Nature Communications}}, title = {{Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data}}, url = {{http://dx.doi.org/10.1038/s41467-022-32097-3}}, doi = {{10.1038/s41467-022-32097-3}}, volume = {{13}}, year = {{2022}}, }