电子学报 ›› 2020, Vol. 48 ›› Issue (6): 1077-1083.DOI: 10.3969/j.issn.0372-2112.2020.06.005

• 学术论文 • 上一篇    下一篇

基于深度强化学习的维吾尔语人称代词指代消解

杨启萌1, 禹龙2, 田生伟3, 艾山·吾买尔1   

  1. 1. 新疆大学信息科学与工程学院, 新疆乌鲁木齐 830046;
    2. 新疆大学网络中心, 新疆乌鲁木齐 830046;
    3. 新疆大学软件学院, 新疆乌鲁木齐 83046
  • 收稿日期:2019-07-08 修回日期:2019-09-06 出版日期:2020-06-25 发布日期:2020-06-25
  • 通讯作者: 禹龙
  • 作者简介:杨启萌 男,1993年9月出生,新疆昌吉人. 现为新疆大学博士研究生,主要研究领域为自然语言处理. E-mail:yqm_xju@163.com
    田生伟 男,1973年10月出生,新疆乌鲁木齐人.教授、博士生导师.主要研究领域为自然语言处理、图像处理. E-mail:tianshengwei@163.com
  • 基金资助:
    国家自然科学基金(No.61563051,No.61662074,No.61262064);国家自然科学基金重点项目(No.61331011);新疆自治区科技人才培养项目(No.QN2016YX0051)

Anaphora Resolution of Uyghur Personal Pronouns Based on Deep Reinforcement Learning

YANG Qi-meng1, YU Long2, TIAN Sheng-wei3, Aishan Wumaier1   

  1. 1. School of Information Science and Engineering, Xinjiang University, Urumqi, Xinjiang 830046, China;
    2. Network Center, Xinjiang University, Urumqi, Xinjiang 830046, China;
    3. Software College, Xinjiang University, Urumqi, Xinjiang 830046, China
  • Received:2019-07-08 Revised:2019-09-06 Online:2020-06-25 Published:2020-06-25

摘要: 针对深度神经网络模型仅学习当前指代链语义信息忽略了单个指代链识别结果的长期影响问题,提出一种结合深度强化学习(deep reinforcement learning)的维吾尔语人称代词指代消解方法.该方法将指代消解任务定义为强化学习环境下顺序决策过程,有效利用之前状态中先行语信息判定当前指代链指代关系.同时,采用基于整体奖励信号优化策略,相比于使用损失函数启发式优化特定的单个决策,该方法直接优化整体评估指标更加高效.最后在维吾尔语数据集进行实验,实验结果显示,该方法在维吾尔语人称代词指代消解任务中的F值为85.80%.实验结果表明,深度强化学习模型能显著提升维吾尔语人称代词指代消解性能.

关键词: 强化学习, 指代消解, 维吾尔语, 词向量, 深度学习, 自然语言处理

Abstract: Deep neural network models for Uyghur personal pronouns resolution learn semantic information for current anaphora chain,but ignore the long-term effects of single anaphora chain recognition results.This paper proposes a Uyghur personal pronoun anaphora resolution based on deep reinforcement learning.This method defines the anaphora resolution task as the sequential decision process under the reinforcement learning environment,and effectively uses the antecedent information in the previous state to determine the current personal pronoun-candidate antecedent pairs.In this study,we use an overall reward signal optimization strategy,which is more efficient than directly using the loss function heuristic to optimize a specific single decision.Finally,we conduct experiments in the Uyghur dataset.The experimental results show that the F value of this method in the Uyghur personal pronouns resolution task is 85.80%.The experimental results show that the deep reinforcement learning model can significantly improve the performance of the Uyghur personal pronouns resolution.

Key words: reinforcement learning, anaphora resolution, Uyghur, word embedding, deep learning, natural language processing

中图分类号: