Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Time-series anonymization of tabular health data using generative adversarial network

Hashemi, Atiye Sadat LU ; Etminani, Kobra ; Soliman, Amira ; Hamed, Omar and Lundström, Jens (2023) p.1-8
Abstract
Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training,... (More)
Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques. (Less)
Please use this url to cite or link to this publication:
author
; ; ; and
publishing date
type
Chapter in Book/Report/Conference proceeding
publication status
published
host publication
2023 International Joint Conference on Neural Networks (IJCNN)
pages
1 - 8
publisher
IEEE Press
external identifiers
  • scopus:85169602590
DOI
10.1109/IJCNN54540.2023.10191367
language
English
LU publication?
no
id
c997627a-6084-4498-ab97-5e1a97b5b023
date added to LUP
2025-01-31 14:09:54
date last changed
2025-02-04 04:01:21
@inproceedings{c997627a-6084-4498-ab97-5e1a97b5b023,
  abstract     = {{Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques.}},
  author       = {{Hashemi, Atiye Sadat and Etminani, Kobra and Soliman, Amira and Hamed, Omar and Lundström, Jens}},
  booktitle    = {{2023 International Joint Conference on Neural Networks  (IJCNN)}},
  language     = {{eng}},
  pages        = {{1--8}},
  publisher    = {{IEEE Press}},
  title        = {{Time-series anonymization of tabular health data using generative adversarial network}},
  url          = {{http://dx.doi.org/10.1109/IJCNN54540.2023.10191367}},
  doi          = {{10.1109/IJCNN54540.2023.10191367}},
  year         = {{2023}},
}