Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

More polished, not necessarily more learned : LLMs and perceived text quality in higher education

Tärning, Betty LU ; Tjøstheim, Trond A. LU and Wallin, Annika LU orcid (2025) In Frontiers in Artificial Intelligence
Abstract
The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic
in higher education, prompting debate over their educational impact. Studies on
the effect of LLMs on learning in higher education often rely on self-reported data,
leaving an opening for complimentary methodologies. This study contributes by
analysing actual course grades as well as ratings by fellow students to investigate
how LLMs can affect academic outcomes. We investigated whether using LLMs
affected students’ learning by allowing them to choose one of three options for
a written assignment: (1) composing the text without LLM assistance; (2) writing
a first draft and using an LLM for revisions; or (3) generating a first draft... (More)
The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic
in higher education, prompting debate over their educational impact. Studies on
the effect of LLMs on learning in higher education often rely on self-reported data,
leaving an opening for complimentary methodologies. This study contributes by
analysing actual course grades as well as ratings by fellow students to investigate
how LLMs can affect academic outcomes. We investigated whether using LLMs
affected students’ learning by allowing them to choose one of three options for
a written assignment: (1) composing the text without LLM assistance; (2) writing
a first draft and using an LLM for revisions; or (3) generating a first draft with an
LLM and then revising it themselves. Students’ learning was measured by their
scores on a mid-course exam and final course grades. Additionally, we assessed
how the students rate the quality of fellow students’ texts for each of the three
conditions. Finally we examined how accurately fellow students could identify
which LLM option (1–3) was used for a given text. Our results indicate only a weak
effect of LLM use. However, writing a first draft and using an LLM for revisions
compared favourably to the ‘no LLM’ baseline in terms of final grades. Ratings
for fellow students’ texts was higher for texts created using option 3, specifically
regarding how well-written they were judged to be. Regarding text classification,
students most accurately predicted the ‘no LLM’ baseline, but were unable to
identify texts that were generated by an LLM and then edited by a student at a
rate better than chance. (Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
LLM, generative AI, higher education, student learning outcome, academic writing, text quality, peer assessment
in
Frontiers in Artificial Intelligence
pages
9 pages
publisher
Frontiers Media S. A.
ISSN
2624-8212
DOI
10.3389/frai.2025.1653992
language
English
LU publication?
yes
id
4a1cd6ff-393f-4aa5-806d-7220a170da98
date added to LUP
2025-12-01 14:36:59
date last changed
2025-12-04 10:36:23
@article{4a1cd6ff-393f-4aa5-806d-7220a170da98,
  abstract     = {{The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic<br/>in higher education, prompting debate over their educational impact. Studies on<br/>the effect of LLMs on learning in higher education often rely on self-reported data,<br/>leaving an opening for complimentary methodologies. This study contributes by<br/>analysing actual course grades as well as ratings by fellow students to investigate<br/>how LLMs can affect academic outcomes. We investigated whether using LLMs<br/>affected students’ learning by allowing them to choose one of three options for<br/>a written assignment: (1) composing the text without LLM assistance; (2) writing<br/>a first draft and using an LLM for revisions; or (3) generating a first draft with an<br/>LLM and then revising it themselves. Students’ learning was measured by their<br/>scores on a mid-course exam and final course grades. Additionally, we assessed<br/>how the students rate the quality of fellow students’ texts for each of the three<br/>conditions. Finally we examined how accurately fellow students could identify<br/>which LLM option (1–3) was used for a given text. Our results indicate only a weak<br/>effect of LLM use. However, writing a first draft and using an LLM for revisions<br/>compared favourably to the ‘no LLM’ baseline in terms of final grades. Ratings<br/>for fellow students’ texts was higher for texts created using option 3, specifically<br/>regarding how well-written they were judged to be. Regarding text classification,<br/>students most accurately predicted the ‘no LLM’ baseline, but were unable to<br/>identify texts that were generated by an LLM and then edited by a student at a<br/>rate better than chance.}},
  author       = {{Tärning, Betty and Tjøstheim, Trond A. and Wallin, Annika}},
  issn         = {{2624-8212}},
  keywords     = {{LLM; generative AI; higher education; student learning outcome; academic writing; text quality; peer assessment}},
  language     = {{eng}},
  month        = {{12}},
  publisher    = {{Frontiers Media S. A.}},
  series       = {{Frontiers in Artificial Intelligence}},
  title        = {{More polished, not necessarily more learned : LLMs and perceived text quality in higher education}},
  url          = {{https://lup.lub.lu.se/search/files/234491786/Ta_rning_2025_-_More_polished_not_necessarily_more_learned.pdf}},
  doi          = {{10.3389/frai.2025.1653992}},
  year         = {{2025}},
}