Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Synthetic Data : From Data Scarcity to Data Pollution

Wiehn, Tanja LU orcid (2024) In Surveillance & Society 22(4). p.472-476
Abstract
The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to... (More)
The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to the debate on the promises and perils of synthetic data. The first is the notion of data scarcity—often leveraged to argue for the implementation and further development of synthetic data to train bespoke models. Second, I discuss the concerns of data pollution and contamination with synthetic data. Through these entry points, I argue that synthetic data re-ignites issues previously raised by scholars in the field of critical data and surveillance studies. Therefore, the aim of this dialogue paper is to call for a critical understanding of synthetic data as living information, much like collected data, and to account for synthetic data and the conditions of its generation in the context of simulated environments. (Less)
Please use this url to cite or link to this publication:
author
publishing date
type
Contribution to journal
publication status
published
subject
keywords
synthetic data, data scarcity, data pollution, Artificial Intelligence, datafication, data contamination
in
Surveillance & Society
volume
22
issue
4
pages
5 pages
publisher
Surveillance Studies Network
external identifiers
  • scopus:85216232275
ISSN
1477-7487
DOI
10.24908/ss.v22i4.18327
language
English
LU publication?
no
id
76b07ca3-fcde-4171-a86d-bb5c2c0ed3a6
date added to LUP
2025-03-19 13:04:28
date last changed
2025-04-04 13:59:54
@article{76b07ca3-fcde-4171-a86d-bb5c2c0ed3a6,
  abstract     = {{The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to the debate on the promises and perils of synthetic data. The first is the notion of data scarcity—often leveraged to argue for the implementation and further development of synthetic data to train bespoke models. Second, I discuss the concerns of data pollution and contamination with synthetic data. Through these entry points, I argue that synthetic data re-ignites issues previously raised by scholars in the field of critical data and surveillance studies. Therefore, the aim of this dialogue paper is to call for a critical understanding of synthetic data as living information, much like collected data, and to account for synthetic data and the conditions of its generation in the context of simulated environments.}},
  author       = {{Wiehn, Tanja}},
  issn         = {{1477-7487}},
  keywords     = {{synthetic data; data scarcity; data pollution; Artificial Intelligence; datafication; data contamination}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{472--476}},
  publisher    = {{Surveillance Studies Network}},
  series       = {{Surveillance & Society}},
  title        = {{Synthetic Data : From Data Scarcity to Data Pollution}},
  url          = {{http://dx.doi.org/10.24908/ss.v22i4.18327}},
  doi          = {{10.24908/ss.v22i4.18327}},
  volume       = {{22}},
  year         = {{2024}},
}