Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Assessing Meaningful Change in Mental Health Using Large Language Models - an Anchor-Based Approach

Ekstrand Odd, Kevin LU (2024) PSYP01 20231
Department of Psychology
Abstract
What determines whether a change is meaningful or not, and how can it be assessed? Distribution-based methods of assessing meaningful change focuses on statistical significance, while anchor-based methods use other ways of measuring the same construct as the original scale (e.g., PHQ-9), such as expert assessments or other rating scales. This study intends to introduce and validate a novel way of anchoring meaningful change - using meaningful change assessments based in natural language processing. 500 individuals were asked to fill in a self-assessed scale of depression (PHQ-9) and to describe their mental health in an open-ended question at two time points. These descriptive sentences were read by two experts who assessed undirected... (More)
What determines whether a change is meaningful or not, and how can it be assessed? Distribution-based methods of assessing meaningful change focuses on statistical significance, while anchor-based methods use other ways of measuring the same construct as the original scale (e.g., PHQ-9), such as expert assessments or other rating scales. This study intends to introduce and validate a novel way of anchoring meaningful change - using meaningful change assessments based in natural language processing. 500 individuals were asked to fill in a self-assessed scale of depression (PHQ-9) and to describe their mental health in an open-ended question at two time points. These descriptive sentences were read by two experts who assessed undirected meaningful change and valence of the change over time, for 300 participants each (98 overlap). These two variables were multiplied to show the meaningful change. Using the large language model RoBERTa, models were created that assessed meaningful change using the descriptive sentences, ridge regression was then used to analyze the correlation between the language- and expert-assessed meaningful change scores. Expert-assessed meaningful change showed high interrater reliability (r = .78, ICC(A,1) = .66, p < .001, N = 98) and high external validity with change in PHQ-9 score (r = .52, p < .001, N = 98). Ridge regression showed that language-assessed meaningful change using the descriptive sentences from both time points were moderately correlated with the expert-assessed meaningful change (r = .69, p < .001, N = 500). The language-assessments also had moderate external validity with the change in PHQ-9 score (r = .37, p < .001, N = 500). Surprisingly, meaningful change could also be assessed using language from only the second time point as well (r = .61, p < .001, N = 500), indicating that perhaps, only a description of the current health state is enough to assess meaningful change in mental health. These results support the use of language-assessed meaningful change of mental health. Ways of improving these assessments are also introduced. (Less)
Please use this url to cite or link to this publication:
author
Ekstrand Odd, Kevin LU
supervisor
organization
course
PSYP01 20231
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Artificial Intelligence, AI, Meaningful Change, Meaningful Change Assessment, Anchor-Based Approach, Mental Health, Depression, Large Language Models, LLM, Natural Language Processing, NLP, Language-Based Assessment, Descriptive Sentences, Open-Ended Questions.
language
English
id
9175306
date added to LUP
2024-09-24 15:58:41
date last changed
2024-09-24 15:58:41
@misc{9175306,
  abstract     = {{What determines whether a change is meaningful or not, and how can it be assessed? Distribution-based methods of assessing meaningful change focuses on statistical significance, while anchor-based methods use other ways of measuring the same construct as the original scale (e.g., PHQ-9), such as expert assessments or other rating scales. This study intends to introduce and validate a novel way of anchoring meaningful change - using meaningful change assessments based in natural language processing. 500 individuals were asked to fill in a self-assessed scale of depression (PHQ-9) and to describe their mental health in an open-ended question at two time points. These descriptive sentences were read by two experts who assessed undirected meaningful change and valence of the change over time, for 300 participants each (98 overlap). These two variables were multiplied to show the meaningful change. Using the large language model RoBERTa, models were created that assessed meaningful change using the descriptive sentences, ridge regression was then used to analyze the correlation between the language- and expert-assessed meaningful change scores. Expert-assessed meaningful change showed high interrater reliability (r = .78, ICC(A,1) = .66, p < .001, N = 98) and high external validity with change in PHQ-9 score (r = .52, p < .001, N = 98). Ridge regression showed that language-assessed meaningful change using the descriptive sentences from both time points were moderately correlated with the expert-assessed meaningful change (r = .69, p < .001, N = 500). The language-assessments also had moderate external validity with the change in PHQ-9 score (r = .37, p < .001, N = 500). Surprisingly, meaningful change could also be assessed using language from only the second time point as well (r = .61, p < .001, N = 500), indicating that perhaps, only a description of the current health state is enough to assess meaningful change in mental health. These results support the use of language-assessed meaningful change of mental health. Ways of improving these assessments are also introduced.}},
  author       = {{Ekstrand Odd, Kevin}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Assessing Meaningful Change in Mental Health Using Large Language Models - an Anchor-Based Approach}},
  year         = {{2024}},
}