The distillation of hallucination:An analysis of value risks in three alignment paths for GAI values

Yi Xianfei; Gao Jinyu

doi:10.15981/j.cnki.dongyueluncong.2026.01.003

2026 01 v.47 26-34+191

幻觉的蒸馏：生成式人工智能价值观三条对齐路径的价值风险解析

易显飞高津宇

1.湖南师范大学马克思主义学院 2.湖南师范大学科技与社会发展研究中心

基金项目(Foundation): 国家社科基金人才专项重大项目“新兴生命科技的人文风验及其治理路径研究”(项目编号：22VRC030)阶段性成果

邮箱(Email):

DOI: 10.15981/j.cnki.dongyueluncong.2026.01.003

发布时间： 2026-02-05

出版时间： 2026-02-05

网络发布时间： 2026-02-05

移动端阅读

519	1,170	2
阅读	下载	被引

工具集

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

扫码分享到微信或朋友圈

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

摘要全文参考文献出版信息相关文章附件资源

摘要：

在价值观对齐的过程中，生成式人工智能(GAI)内生的幻觉问题，已从单纯的技术误差衍生为系统性的价值风险。“人工反馈”“人机协同”与“机器自主”三条对齐路径均存在不同维度的风险表征：在人工反馈对齐中，人类自身价值观的复杂与GAI模拟的有限导致价值观对齐存在价值偏差；在人机协同对齐中，GAI生成的价值观使人机协同陷入认知封闭的价值陷阱，并代替人类价值观成为规范性价值；在机器自主对齐中，GAI价值观的自我复制无法回避自我指涉的难题，技术性拟真也将人类价值观引入滑坡。破解对齐价值风险的关键，在于将对齐的重心从对单一价值的模拟转向对多元价值的权衡，确保技术发展始终锚定非幻觉化的人类真实的生活世界。

关键词： 生成式人工智能; 价值观对齐; 人机关系; 机器幻觉;

Abstract：

In the process of value alignment,the inherent hallucination problem in generative artificial intelligence(GAI) has evolved from mere technical errors into systematic value risks.All the three alignment paths of "training AI systems using human feedback", "training AI systems to assist human evaluation",and "training AI systems to do alignment research" exhibit risk manifestations in different dimensions.In training AI systems using human feedback,the complexity of human values themselves and the limitations of GAI simulation lead to value deviations in value alignment.In training AI systems to assist human evaluation,the values generated by GAI trap human-machine collaboration in a cognitively closed value trap,and replace human values to become normative values.In training AI systems to do alignment research,the self-replication of GAI values cannot avoid the problem of self-reference,and technical simulation also introduces human values into a slippery slope.The key to mitigating the value risks of alignment lies in shifting the focus of alignment from simulating a single value to balancing multiple values,ensuring that technological development is always anchored in the non-hallucinatory real life world of humans.

KeyWords：

如需获取全文，请访问cnki.net

参考文献

①Wiener N.,“Some Moral and Technical Consequences of Automation:As Machines Learn They May Develop Unforeseen Strategies at Rates that Baffle Their Programmers”,Science,1960,131(3410),pp.1355—1358.

②OPENAI:“Our Approach to Alignment Research”,(2023-07-09)[2025-08-09],https://openai.com/blog/our-approach-to-alignment-research.

(1)陶锋:《控制与辩证:人工智能时代的行为哲学研究》,《东岳论丛》,2025年第4期。

(2)[美]温德尔·瓦拉赫,科林·艾伦:《道德机器:如何让机器人明辨是非》,王小红译,北京:北京大学出版社,2017年版,第133页。

(3)程海东,胡孝聪,陈凡:《分布式道德机制:人工智能体道德建模的新策略》,《哲学分析》,2024年第1期。

(4)Verbeek P.P.,“Materializing Morality:Design Ethics and Technological Mediation”,Science,Technology and Human Values,2006,31(3),pp.361—380.

(5)闫宏秀,李洋:《从价值对齐审视价值观技术化的有限性问题及其破解》,《思想理论教育》,2025年第5期。

(6)Berlin I.,The Pursuit of the Ideal,in the Crooked Timber of Humanity:Chapters in the History of Ideas,Princeton:Princeton University Press,2013(1947),p.12.

(7)成素梅,张帆:《人工智能的哲学问题》,上海:上海人民出版社,2020年版,第195页。

(8)陈凡,李嘉伟:《技术作为他者:人与技术伦理关系的新思考》,《武汉大学学报》(哲学社会科学版),2022年第6期。

(9)[德]马丁·海德格尔:《海德格尔选集》,孙周兴选编,上海:上海三联书店,1996年版,第925页。

(10)闫坤如:《人工智能价值对齐的价值表征及伦理路径》,《伦理学研究》,2024年第4期。

(11)《马克思恩格斯全集》(第四十四卷),北京:人民出版社,2001年版,第486页。

(12)张云龙,曹奥琪:《从概念设计到分布式责任:弥合责任缺口的可能路径》,《自然辩证法研究》,2024年第10期。

(13)Sutton R.S.,Barto A.G.,Reinforcement Learning:An Introduction,Cambridge:The MIT Press,1998,p.5.

(14)闫宏秀:《基于信任视角的价值对齐探究》,《浙江社会科学》,2024年第6期。

(15)[美]梅拉妮·米歇尔:《AI 3.0》,王飞跃,李玉珂,王晓等译,成都:四川科学技术出版社,2021年版,第47页。

(16)赵伟:《人工智能价值对齐现有方法的批判与生成性认知路径分析》,《科学技术哲学研究》,2025年第2期。

(17)Dignum V.,Responsible Artificial Intelligence:How to Develop and Use AI in a Responsible Way,Switzerland:Springer Nature,2019,p.2.

基本信息:

DOI：10.15981/j.cnki.dongyueluncong.2026.01.003

中图分类号:TP18;B018

引用信息:

[1]易显飞,高津宇.幻觉的蒸馏：生成式人工智能价值观三条对齐路径的价值风险解析[J].东岳论丛,2026,47(01):26-34+191.DOI:10.15981/j.cnki.dongyueluncong.2026.01.003.

基金信息:

国家社科基金人才专项重大项目“新兴生命科技的人文风验及其治理路径研究”(项目编号：22VRC030)阶段性成果

发布时间：

2026-02-05

出版时间：

2026-02-05

网络发布时间：

2026-02-05

请选择需要下载的pdf数据

东岳论丛

工具集

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

文档文件

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

东岳论丛

工具集

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

文档文件

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈