Evaluating healthcare quality and inequities using generative AI : a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)

Öberg, Johan; Perez-Vicente, Raquel; Lindström, Martin; Midlöv, Patrik; Merlo, Juan

Evaluating healthcare quality and inequities using generative AI : a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)

Mark

Öberg, Johan ^LU

; Perez-Vicente, Raquel ^LU

; Lindström, Martin ^LU ; Midlöv, Patrik ^LU

and Merlo, Juan ^LU

(2025) In Discover Artificial Intelligence 5(1).

Abstract: Background: Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the established framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The study investigates whether a customized GPT can effectively apply the AIHDA-framework to assess healthcare quality in a simulated dataset. Population and methods: Using simulated data modelled on real-world healthcare information, we evaluated the quality indicator of potentially inappropriate medication (PIM). A customized GPT built on ChatGPT 4o was... (More); Background: Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the established framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The study investigates whether a customized GPT can effectively apply the AIHDA-framework to assess healthcare quality in a simulated dataset. Population and methods: Using simulated data modelled on real-world healthcare information, we evaluated the quality indicator of potentially inappropriate medication (PIM). A customized GPT built on ChatGPT 4o was prompted via the principle TREF (Task, Requirement, Expectation, Format) to perform the analysis. Results were compared to a traditional analysis performed with Stata to evaluate accuracy and reliability. Results: The GPT successfully conducted the AIHDA analysis, producing results equal to those of the Stata analysis. The GPT provides useful visualizations and structured reports as well as interactive dialog with the end-user in real-time. However, occasional variations in the results occurred in some iterations of the analysis, highlighting potential issues with reliability. The analysis requires close supervision, as the GPT presents both errors and correct results with confidence. Conclusions: Generative AI and LLMs show promise in supporting standardized monitoring of healthcare quality and equity using the AIHDA-framework. It enables accessible analysis but requires oversight to address limitations such as occasional inaccuracies. Future and more reliable models of LLMs and local deployment on secure servers may further enhance the utility for routine healthcare monitoring.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/0986f3a0-6483-42b5-beba-ad7b290b8b39

author

Öberg, Johan ^LU

; Perez-Vicente, Raquel ^LU

; Lindström, Martin ^LU ; Midlöv, Patrik ^LU

and Merlo, Juan ^LU

organization

publishing date

2025-12

type

Contribution to journal

publication status

published

subject

Health Care Service and Management, Health Policy and Services and Health Economy

keywords

Epidemiological methods, Health care quality assessment, Health services evaluation, Social epidemiology

in

Discover Artificial Intelligence

volume

5

issue

1

article number

175

publisher

Springer Nature

external identifiers

scopus:105011402264

DOI

10.1007/s44163-025-00444-0

language

English

LU publication?

yes

additional info

id

0986f3a0-6483-42b5-beba-ad7b290b8b39

date added to LUP

2025-08-03 13:45:09

date last changed

2025-10-14 10:17:11

@article{0986f3a0-6483-42b5-beba-ad7b290b8b39,
  abstract     = {{<p>Background: Regular monitoring of healthcare quality and equity is crucial for informing decision-makers and clinicians. This study explores the application of generative AI, more specifically large language models (LLMs), to facilitate standardized monitoring of healthcare quality using the established framework Analysis of Individual Heterogeneity and Discriminatory Accuracy (AIHDA). The study investigates whether a customized GPT can effectively apply the AIHDA-framework to assess healthcare quality in a simulated dataset. Population and methods: Using simulated data modelled on real-world healthcare information, we evaluated the quality indicator of potentially inappropriate medication (PIM). A customized GPT built on ChatGPT 4o was prompted via the principle TREF (Task, Requirement, Expectation, Format) to perform the analysis. Results were compared to a traditional analysis performed with Stata to evaluate accuracy and reliability. Results: The GPT successfully conducted the AIHDA analysis, producing results equal to those of the Stata analysis. The GPT provides useful visualizations and structured reports as well as interactive dialog with the end-user in real-time. However, occasional variations in the results occurred in some iterations of the analysis, highlighting potential issues with reliability. The analysis requires close supervision, as the GPT presents both errors and correct results with confidence. Conclusions: Generative AI and LLMs show promise in supporting standardized monitoring of healthcare quality and equity using the AIHDA-framework. It enables accessible analysis but requires oversight to address limitations such as occasional inaccuracies. Future and more reliable models of LLMs and local deployment on secure servers may further enhance the utility for routine healthcare monitoring.</p>}},
  author       = {{Öberg, Johan and Perez-Vicente, Raquel and Lindström, Martin and Midlöv, Patrik and Merlo, Juan}},
  keywords     = {{Epidemiological methods; Health care quality assessment; Health services evaluation; Social epidemiology}},
  language     = {{eng}},
  number       = {{1}},
  publisher    = {{Springer Nature}},
  series       = {{Discover Artificial Intelligence}},
  title        = {{Evaluating healthcare quality and inequities using generative AI : a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)}},
  url          = {{http://dx.doi.org/10.1007/s44163-025-00444-0}},
  doi          = {{10.1007/s44163-025-00444-0}},
  volume       = {{5}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Evaluating healthcare quality and inequities using generative AI : a simulation study of potentially inappropriate medication among older adults analyzed via the framework analysis of individual heterogeneity and discriminatory accuracy (AIHDA)