Time-series anonymization of tabular health data using generative adversarial network
(2023) p.1-8- Abstract
- Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training,... (More)
- Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/c997627a-6084-4498-ab97-5e1a97b5b023
- author
- Hashemi, Atiye Sadat LU ; Etminani, Kobra ; Soliman, Amira ; Hamed, Omar and Lundström, Jens
- publishing date
- 2023
- type
- Chapter in Book/Report/Conference proceeding
- publication status
- published
- host publication
- 2023 International Joint Conference on Neural Networks (IJCNN)
- pages
- 1 - 8
- publisher
- IEEE Press
- external identifiers
-
- scopus:85169602590
- DOI
- 10.1109/IJCNN54540.2023.10191367
- language
- English
- LU publication?
- no
- id
- c997627a-6084-4498-ab97-5e1a97b5b023
- date added to LUP
- 2025-01-31 14:09:54
- date last changed
- 2025-02-04 04:01:21
@inproceedings{c997627a-6084-4498-ab97-5e1a97b5b023, abstract = {{Data anonymization has been used as a fundamental tool in various domains, e.g. healthcare, to alter personal data such that individuals can no longer be identified directly or indirectly in a way to enable broader sharing of data. For example, data perturbation techniques add noise to original data allowing individual record confidentiality while maintaining high-quality data for analytical purposes. In this paper, we propose a perturbation technique for anonymizing longitudinal tabular data such as electronic health records (EHRs). Our model starts by learning a latent space of original data to better capture temporal trends, then employs a generative adversarial network together to train a perturbation generator. During model training, a time-supervised loss function for handling sequence-dependent noise, together with the adversarial unsupervised, anonymization, and reconstruction loss functions are utilized. To evaluate our model quantitatively, we use multiple evaluation metrics for the fidelity, utility, and identifiability of generated data, in addition, the model is evaluated qualitatively by visualizing generated and original data. The results confirm that our model preserves the privacy of the original data and generates a perturbed version with high fidelity and utility compared to some state-of-the-art techniques.}}, author = {{Hashemi, Atiye Sadat and Etminani, Kobra and Soliman, Amira and Hamed, Omar and Lundström, Jens}}, booktitle = {{2023 International Joint Conference on Neural Networks (IJCNN)}}, language = {{eng}}, pages = {{1--8}}, publisher = {{IEEE Press}}, title = {{Time-series anonymization of tabular health data using generative adversarial network}}, url = {{http://dx.doi.org/10.1109/IJCNN54540.2023.10191367}}, doi = {{10.1109/IJCNN54540.2023.10191367}}, year = {{2023}}, }