Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden

Arvidsson, Rasmus; Widén, Jonathan; Al-Naasan, Lina; Gunnarsson, Ronny Kent K; Nymberg, Peter; Blease, Charlotte R; Moberg, Anna; Sundvall, Pär-Daniel; Wikberg, Carl; Sundemo, David

Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden

Mark

Arvidsson, Rasmus ; Widén, Jonathan ; Al-Naasan, Lina ; Gunnarsson, Ronny Kent K ; Nymberg, Peter ^LU

; Blease, Charlotte R ; Moberg, Anna ; Sundvall, Pär-Daniel ; Wikberg, Carl and Sundemo, David (2026) In BMJ Health & Care Informatics 33(1).

Abstract

OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.

METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.

RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher... (More)

OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.

METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.

RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p<0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p<0.001). A similar pattern was observed in the other vignettes.

DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.

CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.

(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/2b5420a0-5260-4353-abe1-451b2dd3ccc4

author

Arvidsson, Rasmus ; Widén, Jonathan ; Al-Naasan, Lina ; Gunnarsson, Ronny Kent K ; Nymberg, Peter ^LU

; Blease, Charlotte R ; Moberg, Anna ; Sundvall, Pär-Daniel ; Wikberg, Carl and Sundemo, David

organization

Family medicine, cardiovascular medicine and genetics (research group)

publishing date

2026-04-02

type

Contribution to journal

publication status

published

subject

Health Care Service and Management, Health Policy and Services and Health Economy

keywords

Humans, Sweden, Cross-Sectional Studies, Artificial Intelligence/standards, Male, Adult, Female, Physicians/psychology, Middle Aged, Surveys and Questionnaires, Sensitivity and Specificity, Triage, Aged

in

BMJ Health & Care Informatics

volume

33

issue

1

article number

e101899

publisher

BMJ Publishing Group

external identifiers

pmid:41927104
scopus:105034953053

ISSN

2632-1009

DOI

10.1136/bmjhci-2025-101899

language

English

LU publication?

yes

additional info

id

2b5420a0-5260-4353-abe1-451b2dd3ccc4

date added to LUP

2026-04-18 16:29:09

date last changed

2026-04-20 07:00:39

@article{2b5420a0-5260-4353-abe1-451b2dd3ccc4,
  abstract     = {{<p>OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.</p><p>METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.</p><p>RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p&lt;0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p&lt;0.001). A similar pattern was observed in the other vignettes.</p><p>DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.</p><p>CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.</p>}},
  author       = {{Arvidsson, Rasmus and Widén, Jonathan and Al-Naasan, Lina and Gunnarsson, Ronny Kent K and Nymberg, Peter and Blease, Charlotte R and Moberg, Anna and Sundvall, Pär-Daniel and Wikberg, Carl and Sundemo, David}},
  issn         = {{2632-1009}},
  keywords     = {{Humans; Sweden; Cross-Sectional Studies; Artificial Intelligence/standards; Male; Adult; Female; Physicians/psychology; Middle Aged; Surveys and Questionnaires; Sensitivity and Specificity; Triage; Aged}},
  language     = {{eng}},
  month        = {{04}},
  number       = {{1}},
  publisher    = {{BMJ Publishing Group}},
  series       = {{BMJ Health & Care Informatics}},
  title        = {{Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden}},
  url          = {{http://dx.doi.org/10.1136/bmjhci-2025-101899}},
  doi          = {{10.1136/bmjhci-2025-101899}},
  volume       = {{33}},
  year         = {{2026}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden