Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models

Malik, Minahil

Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models

Mark

Malik, Minahil ^LU (2024) SIMZ51 20241
Graduate School

Abstract: Large language models (LLMs) currently represent the most sophisticated form of
artificial intelligence. Their capabilities make them increasingly able to influence
human opinion. A critical concern is sycophancy, a sophisticated form of imitation
where models tailor their responses to align with their user's affiliation. This behaviour
risks entrapping individuals in filter bubbles by reinforcing their worldviews, thus
undermining the essence of communicative rationality.
Whilst academics have researched the problem of bias extensively, the concept
of sycophancy has been neglected by the social sciences and treated as a technical
phenomenon, often divorced from the wider social setting. This thesis discusses the
risks of... (More); Large language models (LLMs) currently represent the most sophisticated form of
artificial intelligence. Their capabilities make them increasingly able to influence
human opinion. A critical concern is sycophancy, a sophisticated form of imitation
where models tailor their responses to align with their user's affiliation. This behaviour
risks entrapping individuals in filter bubbles by reinforcing their worldviews, thus
undermining the essence of communicative rationality.
Whilst academics have researched the problem of bias extensively, the concept
of sycophancy has been neglected by the social sciences and treated as a technical
phenomenon, often divorced from the wider social setting. This thesis discusses the
risks of such neglect and argues that sycophantic behaviour should be conceptualised
first and foremost within the social sciences as a concern for political deliberation. This
study challenges traditional ontologies that exclusively attribute rationality solely to
human agents and evaluates the role of LLMs in democratic deliberation. Despite
significant research on LLMs, the fundamental moral and political values intrinsic to
these models have yet to be thoroughly examined from a normative standpoint.
This thesis introduces a novel methodological approach by using machine
learning techniques, including few-shot learning, prompt engineering, and probabilistic
output analysis, to investigate sycophancy within the fine-tuned models, GPT-3.5 and
GPT-4. The results indicate that these models exhibit political and moral sycophancy,
meaning that they change their outputs based on the user's moral or political
affiliations. Furthermore, the models exhibit a greater propensity to deviate from their
baseline responses and align their answers with the political and moral beliefs of right-wing ideologies. The findings of this study highlight a remarkable level of deception
among these models and a deep understanding of user preferences. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9151763

author

Malik, Minahil ^LU

supervisor

Robert Klemmensen ^LU

organization

Graduate School

course

SIMZ51 20241

year

2024

type

H2 - Master's Degree (Two Years)

subject

Social Sciences

keywords

sycophancy, large language models, machine learning, few-shot prompting, political deliberation, communicative rationality, political psychology

language

English

id

9151763

date added to LUP

2024-06-26 12:30:39

date last changed

2024-06-26 12:30:39

@misc{9151763,
  abstract     = {{Large language models (LLMs) currently represent the most sophisticated form of 
artificial intelligence. Their capabilities make them increasingly able to influence
human opinion. A critical concern is sycophancy, a sophisticated form of imitation 
where models tailor their responses to align with their user's affiliation. This behaviour 
risks entrapping individuals in filter bubbles by reinforcing their worldviews, thus 
undermining the essence of communicative rationality.
Whilst academics have researched the problem of bias extensively, the concept 
of sycophancy has been neglected by the social sciences and treated as a technical 
phenomenon, often divorced from the wider social setting. This thesis discusses the 
risks of such neglect and argues that sycophantic behaviour should be conceptualised 
first and foremost within the social sciences as a concern for political deliberation. This 
study challenges traditional ontologies that exclusively attribute rationality solely to 
human agents and evaluates the role of LLMs in democratic deliberation. Despite 
significant research on LLMs, the fundamental moral and political values intrinsic to 
these models have yet to be thoroughly examined from a normative standpoint.
This thesis introduces a novel methodological approach by using machine 
learning techniques, including few-shot learning, prompt engineering, and probabilistic 
output analysis, to investigate sycophancy within the fine-tuned models, GPT-3.5 and 
GPT-4. The results indicate that these models exhibit political and moral sycophancy, 
meaning that they change their outputs based on the user's moral or political 
affiliations. Furthermore, the models exhibit a greater propensity to deviate from their 
baseline responses and align their answers with the political and moral beliefs of right-wing ideologies. The findings of this study highlight a remarkable level of deception 
among these models and a deep understanding of user preferences.}},
  author       = {{Malik, Minahil}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Deliberation in the Age of Deception: Measuring Sycophancy in Large Language Models