Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden
(2026) In BMJ Health & Care Informatics 33(1).- Abstract
OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.
METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.
RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher... (More)
OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.
METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.
RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p<0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p<0.001). A similar pattern was observed in the other vignettes.
DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.
CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.
(Less)
- author
- Arvidsson, Rasmus
; Widén, Jonathan
; Al-Naasan, Lina
; Gunnarsson, Ronny Kent K
; Nymberg, Peter
LU
; Blease, Charlotte R
; Moberg, Anna
; Sundvall, Pär-Daniel
; Wikberg, Carl
and Sundemo, David
- organization
- publishing date
- 2026-04-02
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Humans, Sweden, Cross-Sectional Studies, Artificial Intelligence/standards, Male, Adult, Female, Physicians/psychology, Middle Aged, Surveys and Questionnaires, Sensitivity and Specificity, Triage, Aged
- in
- BMJ Health & Care Informatics
- volume
- 33
- issue
- 1
- article number
- e101899
- publisher
- BMJ Publishing Group
- external identifiers
-
- pmid:41927104
- scopus:105034953053
- ISSN
- 2632-1009
- DOI
- 10.1136/bmjhci-2025-101899
- language
- English
- LU publication?
- yes
- additional info
- © Author(s) (or their employer(s)) 2026. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ Group.
- id
- 2b5420a0-5260-4353-abe1-451b2dd3ccc4
- date added to LUP
- 2026-04-18 16:29:09
- date last changed
- 2026-04-20 07:00:39
@article{2b5420a0-5260-4353-abe1-451b2dd3ccc4,
abstract = {{<p>OBJECTIVES: To identify the lowest sensitivity and specificity that physicians and the general population consider acceptable for medical artificial intelligence (AI), relative to current human performance.</p><p>METHODS: In a nationwide, cross-sectional survey in Sweden, 2025, random samples of 500 physicians and 500 adults from the general population were mailed a questionnaire presenting three vignettes (chest pain triage, sore throat triage, ECG myocardial infarction detection) with the corresponding human performance. Participants reported the maximum number of cases an AI should be allowed to miss or over-refer.</p><p>RESULTS: Response rates were 45% among physicians and 31% in the general population. Both groups demanded higher AI accuracy than the human benchmark for all cases. In the chest pain triage vignette, the nurse correctly referred 84 of 100 true emergencies; physicians required the AI to correctly refer 11 additional patients (95% sensitivity) and the general population demanded referral of 16 additional patients (100% sensitivity) (p<0.001 for both groups). Among 100 patients not requiring referral, the nurse would mistakenly refer 66. Both groups required the AI to reduce unnecessary referrals by 16 (50% specificity) (p<0.001). A similar pattern was observed in the other vignettes.</p><p>DISCUSSION: The accuracy thresholds required by the respondents exceed the performance of many existing systems, although emerging AI research shows promise in narrowing the gap.</p><p>CONCLUSION: Physicians and the general population require medical AI systems to outperform human clinicians. When implementing AI in healthcare settings, early engagement with both groups may be necessary to align expectations with real-world system performance.</p>}},
author = {{Arvidsson, Rasmus and Widén, Jonathan and Al-Naasan, Lina and Gunnarsson, Ronny Kent K and Nymberg, Peter and Blease, Charlotte R and Moberg, Anna and Sundvall, Pär-Daniel and Wikberg, Carl and Sundemo, David}},
issn = {{2632-1009}},
keywords = {{Humans; Sweden; Cross-Sectional Studies; Artificial Intelligence/standards; Male; Adult; Female; Physicians/psychology; Middle Aged; Surveys and Questionnaires; Sensitivity and Specificity; Triage; Aged}},
language = {{eng}},
month = {{04}},
number = {{1}},
publisher = {{BMJ Publishing Group}},
series = {{BMJ Health & Care Informatics}},
title = {{Acceptable accuracy for medical AI : a survey of physicians and the general population in Sweden}},
url = {{http://dx.doi.org/10.1136/bmjhci-2025-101899}},
doi = {{10.1136/bmjhci-2025-101899}},
volume = {{33}},
year = {{2026}},
}