Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods

Hara, Konan; Kobayashi, Yasuki; Tomio, Jun; Ito, Yuki; Svensson, Thomas; Ikesu, Ryo; Chung, Ung-Il; Svensson, Akiko Kishi

Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods

Mark

Hara, Konan ; Kobayashi, Yasuki ; Tomio, Jun ; Ito, Yuki ; Svensson, Thomas ^LU ; Ikesu, Ryo ; Chung, Ung-Il and Svensson, Akiko Kishi ^LU (2021) In PLoS ONE 16(9). p.1-19

Abstract: Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support... (More); Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/5288a533-8121-4a2f-98a2-93bfef918a2a

author

Hara, Konan ; Kobayashi, Yasuki ; Tomio, Jun ; Ito, Yuki ; Svensson, Thomas ^LU ; Ikesu, Ryo ; Chung, Ung-Il and Svensson, Akiko Kishi ^LU

organization

Cardiovascular Research - Hypertension (research group)

publishing date

2021

type

Contribution to journal

publication status

published

subject

in

PLoS ONE

volume

16

issue

9

article number

e0254394

pages

1 - 19

publisher

Public Library of Science (PLoS)

external identifiers

scopus:85116212749
pmid:34570785

ISSN

1932-6203

DOI

10.1371/journal.pone.0254394

language

English

LU publication?

yes

id

5288a533-8121-4a2f-98a2-93bfef918a2a

date added to LUP

2021-09-30 05:36:47

date last changed

2025-04-04 15:19:46

@article{5288a533-8121-4a2f-98a2-93bfef918a2a,
  abstract     = {{<p>Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers' knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees' health insurance programs for fiscal year 2016-17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.</p>}},
  author       = {{Hara, Konan and Kobayashi, Yasuki and Tomio, Jun and Ito, Yuki and Svensson, Thomas and Ikesu, Ryo and Chung, Ung-Il and Svensson, Akiko Kishi}},
  issn         = {{1932-6203}},
  language     = {{eng}},
  number       = {{9}},
  pages        = {{1--19}},
  publisher    = {{Public Library of Science (PLoS)}},
  series       = {{PLoS ONE}},
  title        = {{Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods}},
  url          = {{http://dx.doi.org/10.1371/journal.pone.0254394}},
  doi          = {{10.1371/journal.pone.0254394}},
  volume       = {{16}},
  year         = {{2021}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods