Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Initial Development and Validation of Language-Based Assessments for Meaningful Change

Söderström, Ulrika LU (2024) PSYK11 20232
Department of Psychology
Abstract
Meaningful change has been discussed in multiple studies, with the recurring question of how it could be conceptualized and assessed to identify what determines meaningful change and where it occurs. Previous studies have conducted statistical analyses based on traditional rating scales (i.e., the PHQ-9) to assess meaningful change. There is no evidence to be found of previous studies attempting to assess meaningful change through language-based assessments. This study intended to examine whether language-based assessments could be utilized in assessing meaningful change, and if so, to what extent. This study has utilized scores from human-rated meaningful change assessments of natural language responses (NLR) and self-reported scores from... (More)
Meaningful change has been discussed in multiple studies, with the recurring question of how it could be conceptualized and assessed to identify what determines meaningful change and where it occurs. Previous studies have conducted statistical analyses based on traditional rating scales (i.e., the PHQ-9) to assess meaningful change. There is no evidence to be found of previous studies attempting to assess meaningful change through language-based assessments. This study intended to examine whether language-based assessments could be utilized in assessing meaningful change, and if so, to what extent. This study has utilized scores from human-rated meaningful change assessments of natural language responses (NLR) and self-reported scores from the open-source Patient Health Questionnaire-9 (PHQ-9). The study conducted analyses in R-studio based on the text-package and included the large-language model RoBERTa for word embedding, correlation testing for examining reliability and validity, and ridge regression to train the model. The analyses showed results of inter-rater reliability in human-rated assessments (r = .64, p <.001, N = 100), correlation between human-rated assessments and PHQ-9 difference scores (r = .36, p <.001, N = 298), the strongest trained model (r = .39, p <.001, N = 298), and correlation between language-based assessment and PHQ-9 difference scores (r = .29, p <.001, N = 298). These findings suggest that language-based assessments can be further developed to assess meaningful change, and preferably by including human-rated assessment. (Less)
Abstract (Swedish)
Meningsfull förändring har diskuterats i flera studier, med den återkommande frågan om hur den skulle kunna konceptualiseras och bedömas för att identifiera vad som utgör meningsfull förändring och var den uppstår. Tidigare studier har genomfört statistiska analyser baserade på traditionella betygsskalor (d.v.s. PHQ-9) för att bedöma meningsfull förändring. Det finns inga bevis för att tidigare studier försökt bedöma meningsfull förändring genom språkbaserade bedömningar. Denna studie syftade till att undersöka om språkbaserade bedömningar kan användas för att bedöma meningsfull förändring, och i så fall i vilken utsträckning. Denna studie har använt poäng från mänskligt värderade meningsfulla förändringsbedömningar av natural language... (More)
Meningsfull förändring har diskuterats i flera studier, med den återkommande frågan om hur den skulle kunna konceptualiseras och bedömas för att identifiera vad som utgör meningsfull förändring och var den uppstår. Tidigare studier har genomfört statistiska analyser baserade på traditionella betygsskalor (d.v.s. PHQ-9) för att bedöma meningsfull förändring. Det finns inga bevis för att tidigare studier försökt bedöma meningsfull förändring genom språkbaserade bedömningar. Denna studie syftade till att undersöka om språkbaserade bedömningar kan användas för att bedöma meningsfull förändring, och i så fall i vilken utsträckning. Denna studie har använt poäng från mänskligt värderade meningsfulla förändringsbedömningar av natural language responses (NLR) och självrapporterade poäng från Patient Health Questionnaire-9 (PHQ-9). Studien genomförde analyser i R-studio utifrån text-paketet och inkluderade large language modellen RoBERTa för word embedding, korrelationstestning för att undersöka reliabilitet och validitet samt ridge regression för att träna modellen. Analyserna visade resultat av reliabilitet mellan de mänskliga bedömarna (r = .64, p <.001, N = 100), korrelation mellan mänskliga bedömningar och skillnadspoäng från PHQ-9 (r = .36, p <.001, N = 298), den starkast tränade modellen (r = .39, p <.001, N = 298), och korrelation mellan språkbaserade bedömningar och skillnadspoäng från PHQ-9 (r = .29, p <.001, N = 298). Dessa fynd tyder på att språkbaserade bedömningar kan vidareutvecklas för att bedöma meningsfull förändring, och helst genom att inkludera mänskliga bedömningar. (Less)
Please use this url to cite or link to this publication:
author
Söderström, Ulrika LU
supervisor
organization
course
PSYK11 20232
year
type
M2 - Bachelor Degree
subject
keywords
meaningful change, depression, Large Language Models, AI
language
English
id
9144826
date added to LUP
2024-01-25 13:27:21
date last changed
2024-01-25 13:27:21
@misc{9144826,
  abstract     = {{Meaningful change has been discussed in multiple studies, with the recurring question of how it could be conceptualized and assessed to identify what determines meaningful change and where it occurs. Previous studies have conducted statistical analyses based on traditional rating scales (i.e., the PHQ-9) to assess meaningful change. There is no evidence to be found of previous studies attempting to assess meaningful change through language-based assessments. This study intended to examine whether language-based assessments could be utilized in assessing meaningful change, and if so, to what extent. This study has utilized scores from human-rated meaningful change assessments of natural language responses (NLR) and self-reported scores from the open-source Patient Health Questionnaire-9 (PHQ-9). The study conducted analyses in R-studio based on the text-package and included the large-language model RoBERTa for word embedding, correlation testing for examining reliability and validity, and ridge regression to train the model. The analyses showed results of inter-rater reliability in human-rated assessments (r = .64, p <.001, N = 100), correlation between human-rated assessments and PHQ-9 difference scores (r = .36, p <.001, N = 298), the strongest trained model (r = .39, p <.001, N = 298), and correlation between language-based assessment and PHQ-9 difference scores (r = .29, p <.001, N = 298). These findings suggest that language-based assessments can be further developed to assess meaningful change, and preferably by including human-rated assessment.}},
  author       = {{Söderström, Ulrika}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Initial Development and Validation of Language-Based Assessments for Meaningful Change}},
  year         = {{2024}},
}