Agreements ‘in the wild’ : Standards and alignment in machine learning benchmark dataset construction

Engdahl, Isak

Agreements ‘in the wild’ : Standards and alignment in machine learning benchmark dataset construction

Mark

Engdahl, Isak ^LU

(2024) In Big Data and Society 11(2).

Abstract: This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts... (More); This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/e2f0cde8-c19d-4dad-a1cc-d6d0d80b4094

author

Engdahl, Isak ^LU

organization

Sociology

publishing date

2024-04-01

type

Contribution to journal

publication status

published

subject

Information Systems, Social aspects (including Human Aspects of ICT)

keywords

alignment work, benchmark, dataset analysis, Ethnography of machine learning, in-the-wild, standards

in

Big Data and Society

volume

11

issue

2

publisher

SAGE Publications

external identifiers

scopus:85189638657

ISSN

2053-9517

DOI

10.1177/20539517241242457

language

English

LU publication?

yes

id

e2f0cde8-c19d-4dad-a1cc-d6d0d80b4094

date added to LUP

2024-04-25 15:52:44

date last changed

2025-05-10 03:48:52

@article{e2f0cde8-c19d-4dad-a1cc-d6d0d80b4094,
  abstract     = {{<p>This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.</p>}},
  author       = {{Engdahl, Isak}},
  issn         = {{2053-9517}},
  keywords     = {{alignment work; benchmark; dataset analysis; Ethnography of machine learning; in-the-wild; standards}},
  language     = {{eng}},
  month        = {{04}},
  number       = {{2}},
  publisher    = {{SAGE Publications}},
  series       = {{Big Data and Society}},
  title        = {{Agreements ‘in the wild’ : Standards and alignment in machine learning benchmark dataset construction}},
  url          = {{http://dx.doi.org/10.1177/20539517241242457}},
  doi          = {{10.1177/20539517241242457}},
  volume       = {{11}},
  year         = {{2024}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Agreements ‘in the wild’ : Standards and alignment in machine learning benchmark dataset construction