Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts

Sian Tan, Kean ; Cervin, Matti LU ; Leman, Patrick ; Nielsen, Kristopher ; Vasantha Kumar, Prashanth and Medvedev, Oleg N. (2025) In Journal of Psychology and AI 1(1). p.1-17
Abstract
The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1,... (More)
The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1, Google Gemini 1.5 Pro, and Microsoft Co-pilot. Responses were evaluated by five experienced clinical psychologists using quantitative ratings and qualitative feedback. No single model consistently produced high-quality responses across all tasks, though different models showed distinct strengths. Models performed better in structured tasks such as determining session length and discussing goal-setting but struggled with integrative clinical reasoning and treatment implementation. Higher-rated responses demonstrated clinical humility, maintained therapeutic boundaries, and recognised therapy as collaborative. Current LLMs are more promising as supportive tools for clinicians than as therapeutic applications. This paper highlights key areas for development needed to enhance clinical reasoning abilities for effective mental health use. (Less)
Please use this url to cite or link to this publication:
author
; ; ; ; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
in
Journal of Psychology and AI
volume
1
issue
1
pages
1 - 17
DOI
10.1080/29974100.2025.2545258
language
English
LU publication?
yes
id
f95288f1-4d96-43d7-b875-b57fcb3a3618
date added to LUP
2025-09-04 21:14:55
date last changed
2025-09-05 07:35:16
@article{f95288f1-4d96-43d7-b875-b57fcb3a3618,
  abstract     = {{The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1, Google Gemini 1.5 Pro, and Microsoft Co-pilot. Responses were evaluated by five experienced clinical psychologists using quantitative ratings and qualitative feedback. No single model consistently produced high-quality responses across all tasks, though different models showed distinct strengths. Models performed better in structured tasks such as determining session length and discussing goal-setting but struggled with integrative clinical reasoning and treatment implementation. Higher-rated responses demonstrated clinical humility, maintained therapeutic boundaries, and recognised therapy as collaborative. Current LLMs are more promising as supportive tools for clinicians than as therapeutic applications. This paper highlights key areas for development needed to enhance clinical reasoning abilities for effective mental health use.}},
  author       = {{Sian Tan, Kean and Cervin, Matti and Leman, Patrick and Nielsen, Kristopher and Vasantha Kumar, Prashanth and Medvedev, Oleg N.}},
  language     = {{eng}},
  month        = {{09}},
  number       = {{1}},
  pages        = {{1--17}},
  series       = {{Journal of Psychology and AI}},
  title        = {{AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts}},
  url          = {{http://dx.doi.org/10.1080/29974100.2025.2545258}},
  doi          = {{10.1080/29974100.2025.2545258}},
  volume       = {{1}},
  year         = {{2025}},
}