AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts

Sian Tan, Kean; Cervin, Matti; Leman, Patrick; Nielsen, Kristopher; Vasantha Kumar, Prashanth; Medvedev, Oleg N.

AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts

Mark

Sian Tan, Kean ; Cervin, Matti ^LU ; Leman, Patrick ; Nielsen, Kristopher ; Vasantha Kumar, Prashanth and Medvedev, Oleg N. (2025) In Journal of Psychology and AI 1(1). p.1-17

Abstract: The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1,... (More); The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1, Google Gemini 1.5 Pro, and Microsoft Co-pilot. Responses were evaluated by five experienced clinical psychologists using quantitative ratings and qualitative feedback. No single model consistently produced high-quality responses across all tasks, though different models showed distinct strengths. Models performed better in structured tasks such as determining session length and discussing goal-setting but struggled with integrative clinical reasoning and treatment implementation. Higher-rated responses demonstrated clinical humility, maintained therapeutic boundaries, and recognised therapy as collaborative. Current LLMs are more promising as supportive tools for clinicians than as therapeutic applications. This paper highlights key areas for development needed to enhance clinical reasoning abilities for effective mental health use. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/f95288f1-4d96-43d7-b875-b57fcb3a3618

author

Sian Tan, Kean ; Cervin, Matti ^LU ; Leman, Patrick ; Nielsen, Kristopher ; Vasantha Kumar, Prashanth and Medvedev, Oleg N.

organization

Innovations in pediatric mental health (research group)

publishing date

2025-09-04

type

Contribution to journal

publication status

published

subject

in

Journal of Psychology and AI

volume

1

issue

1

pages

1 - 17

DOI

10.1080/29974100.2025.2545258

language

English

LU publication?

yes

id

f95288f1-4d96-43d7-b875-b57fcb3a3618

date added to LUP

2025-09-04 21:14:55

date last changed

2025-09-05 07:35:16

@article{f95288f1-4d96-43d7-b875-b57fcb3a3618,
  abstract     = {{The increasing prevalence of mental health problems coupled with limited access to professional support has prompted exploration of technological solutions. Large Language Models (LLMs) represent a potential tool to address these challenges, yet their capabilities in psychotherapeutic contexts remain unclear. This study examined the competencies of current LLMs in psychotherapy-related tasks including alignment with evidence-informed clinical standards in case formulation, treatment planning, and implementation. Using an exploratory mixed-methods design, we presented three clinical cases (depression, anxiety, stress) and 12 therapy-related prompts to seven LLMs: ChatGPT-4o, ChatGPT-4, Claude 3.5 Sonnet, Claude 3 Opus, Meta Llama 3.1, Google Gemini 1.5 Pro, and Microsoft Co-pilot. Responses were evaluated by five experienced clinical psychologists using quantitative ratings and qualitative feedback. No single model consistently produced high-quality responses across all tasks, though different models showed distinct strengths. Models performed better in structured tasks such as determining session length and discussing goal-setting but struggled with integrative clinical reasoning and treatment implementation. Higher-rated responses demonstrated clinical humility, maintained therapeutic boundaries, and recognised therapy as collaborative. Current LLMs are more promising as supportive tools for clinicians than as therapeutic applications. This paper highlights key areas for development needed to enhance clinical reasoning abilities for effective mental health use.}},
  author       = {{Sian Tan, Kean and Cervin, Matti and Leman, Patrick and Nielsen, Kristopher and Vasantha Kumar, Prashanth and Medvedev, Oleg N.}},
  language     = {{eng}},
  month        = {{09}},
  number       = {{1}},
  pages        = {{1--17}},
  series       = {{Journal of Psychology and AI}},
  title        = {{AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts}},
  url          = {{http://dx.doi.org/10.1080/29974100.2025.2545258}},
  doi          = {{10.1080/29974100.2025.2545258}},
  volume       = {{1}},
  year         = {{2025}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

AI meets psychology: an exploratory study of large language models’ competence in psychotherapy contexts