Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Human- Versus Machine Learning-Based Triage Using Digitalized Patient Histories in Primary Care : Comparative Study

Entezarjou, Artin LU orcid ; Bonamy, Anna-Karin Edstedt ; Benjaminsson, Simon ; Herman, Pawel and Midlöv, Patrik LU orcid (2020) In JMIR Medical Informatics 8(9).
Abstract

BACKGROUND: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage).

OBJECTIVE: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method.

METHODS: After testing several models, a naïve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and... (More)

BACKGROUND: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage).

OBJECTIVE: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method.

METHODS: After testing several models, a naïve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen κ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement).

RESULTS: Interrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74% (138/186) for cases judged not in need of urgent physical examination and 42% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model's triage decision could be identified. Between physicians within the panel, Cohen κ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen κ of 0.55.

CONCLUSIONS: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care.

(Less)
Please use this url to cite or link to this publication:
author
; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
JMIR Medical Informatics
volume
8
issue
9
article number
e18930
publisher
JMIR Publications Inc.
external identifiers
  • scopus:85097476675
  • pmid:32880578
ISSN
2291-9694
DOI
10.2196/18930
language
English
LU publication?
yes
additional info
©Artin Entezarjou, Anna-Karin Edstedt Bonamy, Simon Benjaminsson, Pawel Herman, Patrik Midlöv. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 03.09.2020.
id
5b055c28-fabb-4f42-a1d6-8b6b57aadee7
date added to LUP
2020-09-06 15:02:11
date last changed
2024-04-03 13:34:04
@article{5b055c28-fabb-4f42-a1d6-8b6b57aadee7,
  abstract     = {{<p>BACKGROUND: Smartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage).</p><p>OBJECTIVE: The purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method.</p><p>METHODS: After testing several models, a naïve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen κ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement).</p><p>RESULTS: Interrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74% (138/186) for cases judged not in need of urgent physical examination and 42% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model's triage decision could be identified. Between physicians within the panel, Cohen κ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen κ of 0.55.</p><p>CONCLUSIONS: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care.</p>}},
  author       = {{Entezarjou, Artin and Bonamy, Anna-Karin Edstedt and Benjaminsson, Simon and Herman, Pawel and Midlöv, Patrik}},
  issn         = {{2291-9694}},
  language     = {{eng}},
  month        = {{09}},
  number       = {{9}},
  publisher    = {{JMIR Publications Inc.}},
  series       = {{JMIR Medical Informatics}},
  title        = {{Human- Versus Machine Learning-Based Triage Using Digitalized Patient Histories in Primary Care : Comparative Study}},
  url          = {{http://dx.doi.org/10.2196/18930}},
  doi          = {{10.2196/18930}},
  volume       = {{8}},
  year         = {{2020}},
}