More polished, not necessarily more learned : LLMs and perceived text quality in higher education
(2025) In Frontiers in Artificial Intelligence- Abstract
- The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic
in higher education, prompting debate over their educational impact. Studies on
the effect of LLMs on learning in higher education often rely on self-reported data,
leaving an opening for complimentary methodologies. This study contributes by
analysing actual course grades as well as ratings by fellow students to investigate
how LLMs can affect academic outcomes. We investigated whether using LLMs
affected students’ learning by allowing them to choose one of three options for
a written assignment: (1) composing the text without LLM assistance; (2) writing
a first draft and using an LLM for revisions; or (3) generating a first draft... (More) - The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic
in higher education, prompting debate over their educational impact. Studies on
the effect of LLMs on learning in higher education often rely on self-reported data,
leaving an opening for complimentary methodologies. This study contributes by
analysing actual course grades as well as ratings by fellow students to investigate
how LLMs can affect academic outcomes. We investigated whether using LLMs
affected students’ learning by allowing them to choose one of three options for
a written assignment: (1) composing the text without LLM assistance; (2) writing
a first draft and using an LLM for revisions; or (3) generating a first draft with an
LLM and then revising it themselves. Students’ learning was measured by their
scores on a mid-course exam and final course grades. Additionally, we assessed
how the students rate the quality of fellow students’ texts for each of the three
conditions. Finally we examined how accurately fellow students could identify
which LLM option (1–3) was used for a given text. Our results indicate only a weak
effect of LLM use. However, writing a first draft and using an LLM for revisions
compared favourably to the ‘no LLM’ baseline in terms of final grades. Ratings
for fellow students’ texts was higher for texts created using option 3, specifically
regarding how well-written they were judged to be. Regarding text classification,
students most accurately predicted the ‘no LLM’ baseline, but were unable to
identify texts that were generated by an LLM and then edited by a student at a
rate better than chance. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/4a1cd6ff-393f-4aa5-806d-7220a170da98
- author
- Tärning, Betty
LU
; Tjøstheim, Trond A.
LU
and Wallin, Annika
LU
- organization
- publishing date
- 2025-12-01
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- LLM, generative AI, higher education, student learning outcome, academic writing, text quality, peer assessment
- in
- Frontiers in Artificial Intelligence
- pages
- 9 pages
- publisher
- Frontiers Media S. A.
- ISSN
- 2624-8212
- DOI
- 10.3389/frai.2025.1653992
- language
- English
- LU publication?
- yes
- id
- 4a1cd6ff-393f-4aa5-806d-7220a170da98
- date added to LUP
- 2025-12-01 14:36:59
- date last changed
- 2025-12-04 10:36:23
@article{4a1cd6ff-393f-4aa5-806d-7220a170da98,
abstract = {{The use of Large Language Models (LLMs) such as ChatGPT is a prominent topic<br/>in higher education, prompting debate over their educational impact. Studies on<br/>the effect of LLMs on learning in higher education often rely on self-reported data,<br/>leaving an opening for complimentary methodologies. This study contributes by<br/>analysing actual course grades as well as ratings by fellow students to investigate<br/>how LLMs can affect academic outcomes. We investigated whether using LLMs<br/>affected students’ learning by allowing them to choose one of three options for<br/>a written assignment: (1) composing the text without LLM assistance; (2) writing<br/>a first draft and using an LLM for revisions; or (3) generating a first draft with an<br/>LLM and then revising it themselves. Students’ learning was measured by their<br/>scores on a mid-course exam and final course grades. Additionally, we assessed<br/>how the students rate the quality of fellow students’ texts for each of the three<br/>conditions. Finally we examined how accurately fellow students could identify<br/>which LLM option (1–3) was used for a given text. Our results indicate only a weak<br/>effect of LLM use. However, writing a first draft and using an LLM for revisions<br/>compared favourably to the ‘no LLM’ baseline in terms of final grades. Ratings<br/>for fellow students’ texts was higher for texts created using option 3, specifically<br/>regarding how well-written they were judged to be. Regarding text classification,<br/>students most accurately predicted the ‘no LLM’ baseline, but were unable to<br/>identify texts that were generated by an LLM and then edited by a student at a<br/>rate better than chance.}},
author = {{Tärning, Betty and Tjøstheim, Trond A. and Wallin, Annika}},
issn = {{2624-8212}},
keywords = {{LLM; generative AI; higher education; student learning outcome; academic writing; text quality; peer assessment}},
language = {{eng}},
month = {{12}},
publisher = {{Frontiers Media S. A.}},
series = {{Frontiers in Artificial Intelligence}},
title = {{More polished, not necessarily more learned : LLMs and perceived text quality in higher education}},
url = {{https://lup.lub.lu.se/search/files/234491786/Ta_rning_2025_-_More_polished_not_necessarily_more_learned.pdf}},
doi = {{10.3389/frai.2025.1653992}},
year = {{2025}},
}