Time-series anonymization of tabular health data using generative adversarial network

Hashemi, Atiye Sadat; Etminani, Kobra; Soliman, Amira; Hamed, Omar; Lundström, Jens

Time-series anonymization of tabular health data using generative adversarial network

Mark

Hashemi, Atiye Sadat ^LU ; Etminani, Kobra ; Soliman, Amira ; Hamed, Omar and Lundström, Jens (2023) p.1-8

Abstract: Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training,... (More); Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/c997627a-6084-4498-ab97-5e1a97b5b023

author

Hashemi, Atiye Sadat ^LU ; Etminani, Kobra ; Soliman, Amira ; Hamed, Omar and Lundström, Jens

publishing date

2023

type

Chapter in Book/Report/Conference proceeding

publication status

published

host publication

2023 International Joint Conference on Neural Networks (IJCNN)

pages

1 - 8

publisher

IEEE Press

external identifiers

scopus:85169602590

DOI

10.1109/IJCNN54540.2023.10191367

language

English

LU publication?

no

id

c997627a-6084-4498-ab97-5e1a97b5b023

date added to LUP

2025-01-31 14:09:54

date last changed

2025-10-14 11:25:19

@inproceedings{c997627a-6084-4498-ab97-5e1a97b5b023,
  abstract     = {{Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques.}},
  author       = {{Hashemi, Atiye Sadat and Etminani, Kobra and Soliman, Amira and Hamed, Omar and Lundström, Jens}},
  booktitle    = {{2023 International Joint Conference on Neural Networks  (IJCNN)}},
  language     = {{eng}},
  pages        = {{1--8}},
  publisher    = {{IEEE Press}},
  title        = {{Time-series anonymization of tabular health data using generative adversarial network}},
  url          = {{http://dx.doi.org/10.1109/IJCNN54540.2023.10191367}},
  doi          = {{10.1109/IJCNN54540.2023.10191367}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Time-series anonymization of tabular health data using generative adversarial network