Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden

Arvidsson, Rasmus ; Widén, Jonathan ; Al-Naasan, Lina ; Gunnarsson, Ronny Kent K ; Nymberg, Peter LU orcid ; Blease, Charlotte R ; Moberg, Anna ; Sundvall, Pär-Daniel ; Wikberg, Carl and Sundemo, David (2026) In BMJ Health & Care Informatics 33(1).
Abstract

OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.

METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.

RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher... (More)

OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.

METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.

RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p<0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p<0.001). A similar pattern was observed in the other vignettes.

DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.

CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; ; ; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Humans, Sweden, Cross-Sectional Studies, Artificial Intelligence/standards, Male, Adult, Female, Physicians/psychology, Middle Aged, Surveys and Questionnaires, Sensitivity and Specificity, Triage, Aged
in
BMJ Health & Care Informatics
volume
33
issue
1
article number
e101899
publisher
BMJ Publishing Group
external identifiers
  • pmid:41927104
  • scopus:105034953053
ISSN
2632-1009
DOI
10.1136/bmjhci-2025-101899
language
English
LU publication?
yes
additional info
© Author(s) (or their employer(s)) 2026. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ Group.
id
2b5420a0-5260-4353-abe1-451b2dd3ccc4
date added to LUP
2026-04-18 16:29:09
date last changed
2026-04-20 07:00:39
@article{2b5420a0-5260-4353-abe1-451b2dd3ccc4,
  abstract     = {{<p>OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.</p><p>METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.</p><p>RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p&lt;0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p&lt;0.001). A similar pattern was observed in the other vignettes.</p><p>DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.</p><p>CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.</p>}},
  author       = {{Arvidsson, Rasmus and Widén, Jonathan and Al-Naasan, Lina and Gunnarsson, Ronny Kent K and Nymberg, Peter and Blease, Charlotte R and Moberg, Anna and Sundvall, Pär-Daniel and Wikberg, Carl and Sundemo, David}},
  issn         = {{2632-1009}},
  keywords     = {{Humans; Sweden; Cross-Sectional Studies; Artificial Intelligence/standards; Male; Adult; Female; Physicians/psychology; Middle Aged; Surveys and Questionnaires; Sensitivity and Specificity; Triage; Aged}},
  language     = {{eng}},
  month        = {{04}},
  number       = {{1}},
  publisher    = {{BMJ Publishing Group}},
  series       = {{BMJ Health & Care Informatics}},
  title        = {{Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden}},
  url          = {{http://dx.doi.org/10.1136/bmjhci-2025-101899}},
  doi          = {{10.1136/bmjhci-2025-101899}},
  volume       = {{33}},
  year         = {{2026}},
}