Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Developing and Validating Language-Based Assessments for Mental Health : Measuring and Describing Depression, Anxiety, Affect, and Suicidality and Self-Harm Risk, from Individuals’ Own Descriptions

Gu, Zhuojun LU (2025)
Abstract
This thesis develops and evaluates language-based assessments that use artificial intelligence (AI) to transform open-ended language into quantitative indicators and descriptions of mental health related constructs. While closed-ended scales have long dominated psychological assessment, they are limited by fixed response formats and may not fully capture the complexity of individuals’ experiences. By contrast, language offers a flexible, and expressive medium for describing thoughts, emotions, and behaviors. Across four papers, this thesis examines whether language-based assessments can provide valid, and reliable tools for assessing psychological constructs such as depression, anxiety and affect, as well as mental health related risk... (More)
This thesis develops and evaluates language-based assessments that use artificial intelligence (AI) to transform open-ended language into quantitative indicators and descriptions of mental health related constructs. While closed-ended scales have long dominated psychological assessment, they are limited by fixed response formats and may not fully capture the complexity of individuals’ experiences. By contrast, language offers a flexible, and expressive medium for describing thoughts, emotions, and behaviors. Across four papers, this thesis examines whether language-based assessments can provide valid, and reliable tools for assessing psychological constructs such as depression, anxiety and affect, as well as mental health related risk assessments including suicidality and self-harm.
Paper I compares four different language response formats—from selecting predefined words to producing full-text responses. We evaluated the response formats in terms of their validity—covering concurrent, incremental, face, discriminant, and external aspects—and their reliability, including test-retest and performance in a prospective sample. Using the Sequential Evaluation with Model Pre-registration (SEMP) approach, machine learning models were trained on a development dataset (N = 963) and pre- registered before being tested on a separate prospective sample (N = 145). These pre-registered models demonstrated moderate to strong validity and reliability, achieving predictive accuracy in the new sample (r = .60–.79). The consistent performance across formats suggests that they may be selected based on specific research or potential practical requirements.
Paper II evaluates AI-based language models to evaluate the risk of suicide and self-harm based on individuals’ open-ended narratives about suicidality, self-harm, depression, anxiety, and overall mental health. Employing the SEMP framework, models were trained (N = 641) and pre-registered, then validated in a held-out set (N = 150) against expert ratings generated using the Longitudinal Expert Data (LED) approach. In a held-out test set, the language-based assessments showed alignment with expert ratings for suicidality (r = .70) and self-harm (r = .68), and significantly outperformed models that relied on demographic data.
Paper III evaluates the causal validity of language-based assessments in an experimental setting. Few studies have tested whether language-based assessments can detect causal changes. In this randomized mixed-design experiment (N = 892), participants underwent mood induction in physical settings (N = 153) or via online videos (N = 739) across three conditions (church, mall, park). They reported pre- and post-mood affect using both open-ended responses and closed-ended Positive and Negative Affect (PANAS) ratings. We compared how well PANAS and language-based assessments classified the conditions. Language-based assessments outperformed PANAS in predictive accuracy across training (AUC = .74 vs. .63), online (AUC = .76 vs. .70), and offline holdout samples (AUC = .67 vs. .53). In addition, language-based assessments provided qualitative insights by visualizing word-level patterns across conditions.
Paper IV introduces the L-BAM Library—an open repository for pre-validated language-based assessment models—and outlines a framework for sharing and applying these tools in transparent and reproducible ways. This paper emphasizes responsible open- science practices and encourages the independent validation of LBAs in new populations and contexts.
In sum, this thesis demonstrates that language-based assessments can serve as valid, reliable, and informative research tools for measuring and describing mental health-related constructs. By leveraging the expressiveness of open-ended language, these methods address limitations of traditional scales and offer new possibilities for capturing complex psychological phenomena. Through systematic validation across diverse samples and contexts—including expert-rated risk assessment, experimental manipulations, and real-world implementation—this work contributes to the methodological advances of language-based methods into the broader landscape of psychological assessment and highlights the importance of transparent, cumulative practices for their continued development. (Less)
Abstract (Swedish)
Denna avhandling utvecklar och utvärderar språkbaserade bedömningar som använder artificiell intelligens (AI) för att omvandla öppet språk till kvantitativa indikatorer och beskrivningar av psykiska hälsorelaterade konstruktioner. Medan slutna skalor länge har dominerat psykologiska bedömningar, är de begränsade av fasta svarsformat och kanske inte helt fångar komplexiteten i individers upplevelser. Däremot erbjuder språk ett flexibelt och uttrycksfullt medium för att beskriva tankar, känslor och beteenden. I fyra artiklar undersöker denna avhandling huruvida språkbaserade bedömningar kan ge giltiga och tillförlitliga verktyg för att bedöma psykologiska konstruktioner som depression, ångest och affekt, samt psykiska hälsorelaterade... (More)
Denna avhandling utvecklar och utvärderar språkbaserade bedömningar som använder artificiell intelligens (AI) för att omvandla öppet språk till kvantitativa indikatorer och beskrivningar av psykiska hälsorelaterade konstruktioner. Medan slutna skalor länge har dominerat psykologiska bedömningar, är de begränsade av fasta svarsformat och kanske inte helt fångar komplexiteten i individers upplevelser. Däremot erbjuder språk ett flexibelt och uttrycksfullt medium för att beskriva tankar, känslor och beteenden. I fyra artiklar undersöker denna avhandling huruvida språkbaserade bedömningar kan ge giltiga och tillförlitliga verktyg för att bedöma psykologiska konstruktioner som depression, ångest och affekt, samt psykiska hälsorelaterade riskbedömningar inklusive suicidalitet och självskadebeteende.

Artikel I jämför fyra olika språkliga svarsformat – från att välja fördefinierade ord till att producera fulltextsvar. Vi utvärderade svarsformaten med avseende på deras validitet – som täckte samtidiga, inkrementella, ansikts-, diskriminerande och externa aspekter – och deras tillförlitlighet, inklusive test-retest och prestation i ett prospektivt urval. Med hjälp av metoden Sequential Evaluation with Model Pre-registration (SEMP) tränades maskininlärningsmodeller på en utvecklingsdatauppsättning (N = 963) och förregistrerades innan de testades på ett separat prospektivt urval (N = 145). Dessa förregistrerade modeller uppvisade måttlig till stark validitet och reliabilitet och uppnådde prediktiv noggrannhet i det nya urvalet (r = 0,60–0,79). Den konsekventa prestandan över olika format tyder på att de kan väljas baserat på specifik forskning eller potentiella praktiska krav.

Artikel II utvärderar AI-baserade språkmodeller för att utvärdera risken för självmord och självskadebeteende baserat på individers öppna berättelser om suicidalitet, självskadebeteende, depression, ångest och allmän psykisk hälsa. Med hjälp av SEMP-ramverket tränades modellerna (N = 641) och förregistrerades, och validerades sedan i en begränsad uppsättning (N = 150) mot expertbedömningar genererade med hjälp av Longitudinal Expert Data (LED)-metoden. I en begränsad testuppsättning visade de språkbaserade bedömningarna överensstämmelse med expertbedömningar för suicidalitet (r = 0,70) och självskadebeteende (r = 0,68), och överträffade signifikant modeller som förlitade sig på demografiska data.
Artikel III utvärderar den kausala validiteten hos språkbaserade bedömningar i en experimentell miljö. Få studier har testat om språkbaserade bedömningar kan upptäcka kausala förändringar. I detta randomiserade experiment med blandad design (N = 892) genomgick deltagarna humörinduktion i fysiska miljöer (N = 153) eller via onlinevideor (N = 739) över tre tillstånd (kyrka, köpcentrum, park). De rapporterade humörpåverkan före och efter med hjälp av både öppna svar och slutna PANAS-bedömningar (Positive and Negative Affect). Vi jämförde hur väl PANAS- och språkbaserade bedömningar klassificerade tillstånden. Språkbaserade bedömningar överträffade PANAS i prediktiv noggrannhet över träning (AUC = 0,74 vs. 0,63), online (AUC = 0,76 vs. 0,70) och offline-prover med utebliven bedömning (AUC = 0,67 vs. 0,53). Dessutom gav språkbaserade bedömningar kvalitativa insikter genom att visualisera mönster på ordnivå över olika villkor.

Artikel IV introducerar L-BAM-biblioteket – ett öppet arkiv för förvaliderade språkbaserade bedömningsmodeller – och beskriver ett ramverk för att dela och tillämpa dessa verktyg på transparenta och reproducerbara sätt. Denna artikel betonar ansvarsfulla öppenvetenskapliga metoder och uppmuntrar oberoende validering av språkbaserade bedömningar i nya populationer och sammanhang.

Sammanfattningsvis visar denna avhandling att språkbaserade bedömningar kan fungera som giltiga, tillförlitliga och informativa forskningsverktyg för att mäta och beskriva psykisk hälsa-relaterade konstruktioner. Genom att utnyttja uttrycksfullheten hos öppet språk, tar dessa metoder itu med begränsningar hos traditionella skalor och erbjuder nya möjligheter att fånga komplexa psykologiska fenomen. Genom systematisk validering över olika urval och sammanhang – inklusive expertbedömd riskbedömning, experimentella manipulationer och implementering i verkligheten – bidrar detta arbete till metodologiska framsteg av språkbaserade metoder i det bredare landskapet av psykologisk bedömning och belyser vikten av transparenta, kumulativa metoder för deras fortsatta utveckling. (Less)
Please use this url to cite or link to this publication:
author
supervisor
opponent
  • Associate Professor Stachl, Clemens, University of St. Gallen
organization
alternative title
Utveckla och validera språkbaserade bedömningar för psykisk hälsa : Att mäta och beskriva risk för depression, ångest, affekt och suicidalitet, samt risk för självskadebeteende, utifrån individers egna beskrivningar
publishing date
type
Thesis
publication status
published
subject
keywords
Language-based assessment, Depression, Anxiety, Suicidality, Self-harm, Affect, AI, Open science
pages
104 pages
publisher
MediaTryck Lund
defense location
Gamla köket, Sh128, Allhelgona Kyrkogata 8, Lund
defense date
2025-11-07 13:00:00
ISBN
978-91-8104-706-6
978-91-8104-705-9
project
Improving CLA from a user-centered approach
language
English
LU publication?
yes
id
61ef4354-0948-4887-bc17-bcaf632b55fc
date added to LUP
2025-10-06 10:57:13
date last changed
2025-10-06 14:25:15
@phdthesis{61ef4354-0948-4887-bc17-bcaf632b55fc,
  abstract     = {{This thesis develops and evaluates language-based assessments that use artificial intelligence (AI) to transform open-ended language into quantitative indicators and descriptions of mental health related constructs. While closed-ended scales have long dominated psychological assessment, they are limited by fixed response formats and may not fully capture the complexity of individuals’ experiences. By contrast, language offers a flexible, and expressive medium for describing thoughts, emotions, and behaviors. Across four papers, this thesis examines whether language-based assessments can provide valid, and reliable tools for assessing psychological constructs such as depression, anxiety and affect, as well as mental health related risk assessments including suicidality and self-harm.<br/>Paper I compares four different language response formats—from selecting predefined words to producing full-text responses. We evaluated the response formats in terms of their validity—covering concurrent, incremental, face, discriminant, and external aspects—and their reliability, including test-retest and performance in a prospective sample. Using the Sequential Evaluation with Model Pre-registration (SEMP) approach, machine learning models were trained on a development dataset (N = 963) and pre- registered before being tested on a separate prospective sample (N = 145). These pre-registered models demonstrated moderate to strong validity and reliability, achieving predictive accuracy in the new sample (r = .60–.79). The consistent performance across formats suggests that they may be selected based on specific research or potential practical requirements.<br/>Paper II evaluates AI-based language models to evaluate the risk of suicide and self-harm based on individuals’ open-ended narratives about suicidality, self-harm, depression, anxiety, and overall mental health. Employing the SEMP framework, models were trained (N = 641) and pre-registered, then validated in a held-out set (N = 150) against expert ratings generated using the Longitudinal Expert Data (LED) approach. In a held-out test set, the language-based assessments showed alignment with expert ratings for suicidality (r = .70) and self-harm (r = .68), and significantly outperformed models that relied on demographic data.<br/>Paper III evaluates the causal validity of language-based assessments in an experimental setting. Few studies have tested whether language-based assessments can detect causal changes. In this randomized mixed-design experiment (N = 892), participants underwent mood induction in physical settings (N = 153) or via online videos (N = 739) across three conditions (church, mall, park). They reported pre- and post-mood affect using both open-ended responses and closed-ended Positive and Negative Affect (PANAS) ratings. We compared how well PANAS and language-based assessments classified the conditions. Language-based assessments outperformed PANAS in predictive accuracy across training (AUC = .74 vs. .63), online (AUC = .76 vs. .70), and offline holdout samples (AUC = .67 vs. .53). In addition, language-based assessments provided qualitative insights by visualizing word-level patterns across conditions.<br/>Paper IV introduces the L-BAM Library—an open repository for pre-validated language-based assessment models—and outlines a framework for sharing and applying these tools in transparent and reproducible ways. This paper emphasizes responsible open- science practices and encourages the independent validation of LBAs in new populations and contexts.<br/>In sum, this thesis demonstrates that language-based assessments can serve as valid, reliable, and informative research tools for measuring and describing mental health-related constructs. By leveraging the expressiveness of open-ended language, these methods address limitations of traditional scales and offer new possibilities for capturing complex psychological phenomena. Through systematic validation across diverse samples and contexts—including expert-rated risk assessment, experimental manipulations, and real-world implementation—this work contributes to the methodological advances of language-based methods into the broader landscape of psychological assessment and highlights the importance of transparent, cumulative practices for their continued development.}},
  author       = {{Gu, Zhuojun}},
  isbn         = {{978-91-8104-706-6}},
  keywords     = {{Language-based assessment; Depression; Anxiety; Suicidality; Self-harm; Affect; AI; Open science}},
  language     = {{eng}},
  publisher    = {{MediaTryck Lund}},
  school       = {{Lund University}},
  title        = {{Developing and Validating Language-Based Assessments for Mental Health : Measuring and Describing Depression, Anxiety, Affect, and Suicidality and Self-Harm Risk, from Individuals’ Own Descriptions}},
  url          = {{https://lup.lub.lu.se/search/files/229138416/GuPhDThesis2025.pdf}},
  year         = {{2025}},
}