Synthetic Data : From Data Scarcity to Data Pollution
(2024) In Surveillance & Society 22(4). p.472-476- Abstract
- The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to... (More)
- The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to the debate on the promises and perils of synthetic data. The first is the notion of data scarcity—often leveraged to argue for the implementation and further development of synthetic data to train bespoke models. Second, I discuss the concerns of data pollution and contamination with synthetic data. Through these entry points, I argue that synthetic data re-ignites issues previously raised by scholars in the field of critical data and surveillance studies. Therefore, the aim of this dialogue paper is to call for a critical understanding of synthetic data as living information, much like collected data, and to account for synthetic data and the conditions of its generation in the context of simulated environments. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/76b07ca3-fcde-4171-a86d-bb5c2c0ed3a6
- author
- Wiehn, Tanja
LU
- publishing date
- 2024-12
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- synthetic data, data scarcity, data pollution, Artificial Intelligence, datafication, data contamination
- in
- Surveillance & Society
- volume
- 22
- issue
- 4
- pages
- 5 pages
- publisher
- Surveillance Studies Network
- external identifiers
-
- scopus:85216232275
- ISSN
- 1477-7487
- DOI
- 10.24908/ss.v22i4.18327
- language
- English
- LU publication?
- no
- id
- 76b07ca3-fcde-4171-a86d-bb5c2c0ed3a6
- date added to LUP
- 2025-03-19 13:04:28
- date last changed
- 2025-04-04 13:59:54
@article{76b07ca3-fcde-4171-a86d-bb5c2c0ed3a6, abstract = {{The increasing development and adaptation of synthetic data raises critical concerns about the perpetuation of datafication logics. In examining some of synthetic data’s core promises, this dialogue paper aims to uncover the potential harm of further de-politicizing synthetic data. With synthetic data, technological opportunities are introduced that promise to resolve a growing demand for data needed to train AI models. Furthermore, models trained on synthetic data are praised as more precise and effective while bring cheaper than collected data (Zewe 2022). With this dialogue paper, I aim to nuance the ways in which synthetic data complicate a critique directed at AI-driven technologies. I build my argument on two elements fundamental to the debate on the promises and perils of synthetic data. The first is the notion of data scarcity—often leveraged to argue for the implementation and further development of synthetic data to train bespoke models. Second, I discuss the concerns of data pollution and contamination with synthetic data. Through these entry points, I argue that synthetic data re-ignites issues previously raised by scholars in the field of critical data and surveillance studies. Therefore, the aim of this dialogue paper is to call for a critical understanding of synthetic data as living information, much like collected data, and to account for synthetic data and the conditions of its generation in the context of simulated environments.}}, author = {{Wiehn, Tanja}}, issn = {{1477-7487}}, keywords = {{synthetic data; data scarcity; data pollution; Artificial Intelligence; datafication; data contamination}}, language = {{eng}}, number = {{4}}, pages = {{472--476}}, publisher = {{Surveillance Studies Network}}, series = {{Surveillance & Society}}, title = {{Synthetic Data : From Data Scarcity to Data Pollution}}, url = {{http://dx.doi.org/10.24908/ss.v22i4.18327}}, doi = {{10.24908/ss.v22i4.18327}}, volume = {{22}}, year = {{2024}}, }