Counter-productivity and suspicion : two arguments against talking about the AGI control problem
(2025) In Philosophical Studies- Abstract
- How do you control a superintelligent artificial being given the possibility that its goals or actions might conflict with human interests? Over the past few decades, this concern– the AGI control problem– has remained a central challenge for research in AI safety. This paper develops and defends two arguments that provide pro tanto support for the following policy for those who worry about the AGI control problem: don’t talk about it. The first is argument from counter-productivity, which states that unless kept secret, efforts to solve the control problem could be used by a misaligned AGI to counter those very efforts. The second is argument from suspicion, stating that open discussions of the control problem may serve to make humanity... (More)
- How do you control a superintelligent artificial being given the possibility that its goals or actions might conflict with human interests? Over the past few decades, this concern– the AGI control problem– has remained a central challenge for research in AI safety. This paper develops and defends two arguments that provide pro tanto support for the following policy for those who worry about the AGI control problem: don’t talk about it. The first is argument from counter-productivity, which states that unless kept secret, efforts to solve the control problem could be used by a misaligned AGI to counter those very efforts. The second is argument from suspicion, stating that open discussions of the control problem may serve to make humanity appear threatening to an AGI, which increases the risk that the AGI perceives humanity as a threat. I consider objections to the arguments and find them unsuccessful. Yet, I also consider objections to the don’t-talk policy itself and find it inconclusive whether it should be adopted. Additionally, the paper examines whether the arguments extend to other areas of AI safety research, such as AGI alignment, and argues that they likely do, albeit not necessarily as directly. I conclude by offering recommendations on what one can safely talk about, regardless of whether the don’t-talk policy is ultimately adopted. (Less)
Please use this url to cite or link to this publication:
https://lup.lub.lu.se/record/ef8f9dbb-1af9-4d71-9cc9-b06287ff590f
- author
- Stenseke, Jakob LU
- organization
- publishing date
- 2025-07-10
- type
- Contribution to journal
- publication status
- epub
- subject
- keywords
- AI alignment, AI control, Existential risk, Superintelligence, Artificial general intelligence
- in
- Philosophical Studies
- pages
- 24 pages
- publisher
- Springer
- ISSN
- 0031-8116
- DOI
- 10.1007/s11098-025-02379-9
- language
- English
- LU publication?
- yes
- id
- ef8f9dbb-1af9-4d71-9cc9-b06287ff590f
- date added to LUP
- 2025-07-10 16:46:38
- date last changed
- 2025-07-16 12:45:29
@article{ef8f9dbb-1af9-4d71-9cc9-b06287ff590f, abstract = {{How do you control a superintelligent artificial being given the possibility that its goals or actions might conflict with human interests? Over the past few decades, this concern– the AGI control problem– has remained a central challenge for research in AI safety. This paper develops and defends two arguments that provide pro tanto support for the following policy for those who worry about the AGI control problem: don’t talk about it. The first is argument from counter-productivity, which states that unless kept secret, efforts to solve the control problem could be used by a misaligned AGI to counter those very efforts. The second is argument from suspicion, stating that open discussions of the control problem may serve to make humanity appear threatening to an AGI, which increases the risk that the AGI perceives humanity as a threat. I consider objections to the arguments and find them unsuccessful. Yet, I also consider objections to the don’t-talk policy itself and find it inconclusive whether it should be adopted. Additionally, the paper examines whether the arguments extend to other areas of AI safety research, such as AGI alignment, and argues that they likely do, albeit not necessarily as directly. I conclude by offering recommendations on what one can safely talk about, regardless of whether the don’t-talk policy is ultimately adopted.}}, author = {{Stenseke, Jakob}}, issn = {{0031-8116}}, keywords = {{AI alignment; AI control; Existential risk; Superintelligence; Artificial general intelligence}}, language = {{eng}}, month = {{07}}, publisher = {{Springer}}, series = {{Philosophical Studies}}, title = {{Counter-productivity and suspicion : two arguments against talking about the AGI control problem}}, url = {{http://dx.doi.org/10.1007/s11098-025-02379-9}}, doi = {{10.1007/s11098-025-02379-9}}, year = {{2025}}, }