XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis

Mousafi Alasal, Laila; Hammarlund, Emma U; Pienta, Kenneth J; Rönnstrand, Lars; Kazi, Julhash U

XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis

Mark

Mousafi Alasal, Laila ^LU ; Hammarlund, Emma U ^LU

; Pienta, Kenneth J ^LU ; Rönnstrand, Lars ^LU

and Kazi, Julhash U ^LU

(2025) In Bioinformatics Advances 5(1). p.1-6

Abstract

MOTIVATION: Missing data present a pervasive challenge in data analysis, potentially biasing outcomes and undermining conclusions if not addressed properly. Missing data are commonly classified into Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). While MCAR poses a minimal risk of data distortion, both MAR and MNAR can seriously affect the results of subsequent analyses. Therefore, it is important to know the type of missing data and appropriately handle them.

RESULTS: To facilitate efficient handling of missing data, we introduce a Python package named XeroGraph that is designed to evaluate data quality, categorize the nature of missingness, and guide imputation decisions. By... (More)

MOTIVATION: Missing data present a pervasive challenge in data analysis, potentially biasing outcomes and undermining conclusions if not addressed properly. Missing data are commonly classified into Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). While MCAR poses a minimal risk of data distortion, both MAR and MNAR can seriously affect the results of subsequent analyses. Therefore, it is important to know the type of missing data and appropriately handle them.

RESULTS: To facilitate efficient handling of missing data, we introduce a Python package named XeroGraph that is designed to evaluate data quality, categorize the nature of missingness, and guide imputation decisions. By comparing how various imputation methods influence underlying distributions, XeroGraph provides a systematic framework that supports more accurate and transparent analyses. Through its comprehensive preliminary assessments and user-friendly interface, this package facilitates the selection of optimal strategies tailored to the specific missing data mechanisms present in a dataset. In doing so, XeroGraph may significantly improve the validity and reproducibility of research findings, making it a valuable tool for professionals in data-intensive fields.

AVAILABILITY AND IMPLEMENTATION: XeroGraph is compatible with all operating systems and requires Python version 3.9 or higher. It can be freely downloaded from PyPI (https://pypi.org/project/XeroGraph). The source code is accessible on GitHub (https://github.com/kazilab/XeroGraph), and comprehensive documentation is available at Read the Docs (https://xerograph.readthedocs.io). This software is distributed under the Apache License 2.0.

(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/ab0a445f-1f08-4379-9d41-115baa2ef93a

author

Mousafi Alasal, Laila ^LU ; Hammarlund, Emma U ^LU

; Pienta, Kenneth J ^LU ; Rönnstrand, Lars ^LU

and Kazi, Julhash U ^LU

organization

publishing date

2025-02-21

type

Contribution to journal

publication status

published

subject

Bioinformatics and Computational Biology

in

Bioinformatics Advances

volume

5

issue

1

article number

vbaf035

pages

1 - 6

publisher

Oxford University Press

external identifiers

pmid:40061871
scopus:86000484031

ISSN

2635-0041

DOI

10.1093/bioadv/vbaf035

language

English

LU publication?

yes

additional info

id

ab0a445f-1f08-4379-9d41-115baa2ef93a

date added to LUP

2025-05-04 13:14:07

date last changed

2026-02-24 06:13:01

@article{ab0a445f-1f08-4379-9d41-115baa2ef93a,
  abstract     = {{<p>MOTIVATION: Missing data present a pervasive challenge in data analysis, potentially biasing outcomes and undermining conclusions if not addressed properly. Missing data are commonly classified into Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). While MCAR poses a minimal risk of data distortion, both MAR and MNAR can seriously affect the results of subsequent analyses. Therefore, it is important to know the type of missing data and appropriately handle them.</p><p>RESULTS: To facilitate efficient handling of missing data, we introduce a Python package named XeroGraph that is designed to evaluate data quality, categorize the nature of missingness, and guide imputation decisions. By comparing how various imputation methods influence underlying distributions, XeroGraph provides a systematic framework that supports more accurate and transparent analyses. Through its comprehensive preliminary assessments and user-friendly interface, this package facilitates the selection of optimal strategies tailored to the specific missing data mechanisms present in a dataset. In doing so, XeroGraph may significantly improve the validity and reproducibility of research findings, making it a valuable tool for professionals in data-intensive fields.</p><p>AVAILABILITY AND IMPLEMENTATION: XeroGraph is compatible with all operating systems and requires Python version 3.9 or higher. It can be freely downloaded from PyPI (https://pypi.org/project/XeroGraph). The source code is accessible on GitHub (https://github.com/kazilab/XeroGraph), and comprehensive documentation is available at Read the Docs (https://xerograph.readthedocs.io). This software is distributed under the Apache License 2.0.</p>}},
  author       = {{Mousafi Alasal, Laila and Hammarlund, Emma U and Pienta, Kenneth J and Rönnstrand, Lars and Kazi, Julhash U}},
  issn         = {{2635-0041}},
  language     = {{eng}},
  month        = {{02}},
  number       = {{1}},
  pages        = {{1--6}},
  publisher    = {{Oxford University Press}},
  series       = {{Bioinformatics Advances}},
  title        = {{XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis}},
  url          = {{http://dx.doi.org/10.1093/bioadv/vbaf035}},
  doi          = {{10.1093/bioadv/vbaf035}},
  volume       = {{5}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis