基于测度优化Laplacian SVM的中文指代消解方法

周炫余; 刘娟; 邵鹏; 卢笑; 罗飞

doi:10.3969/j.issn.0372-2112.2016.12.035

您当前的位置：

首页 >

文章列表页 >

基于测度优化Laplacian SVM的中文指代消解方法

科研通信 | 更新时间：2025-07-16

- 基于测度优化Laplacian SVM的中文指代消解方法
- Chinese Anaphora Resolution Based on Metric-optimized Laplacian SVM
- 电子学报 2016年44卷第12期页码：3064-3072
- 作者机构：
  
  1. 武汉大学软件国家重点实验室,湖北,武汉,430072
  2. 武汉大学计算机学院,湖北,武汉,430072
  3. 湖南大学电气与信息工程学院,湖南,长沙,410082
  4. 武汉大学软件国家重点实验室,湖北,武汉,430072
  5. 武汉大学计算机学院,湖北,武汉,430072
  6. 湖南大学电气与信息工程学院,湖南,长沙,410082
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.61272274）;国家自然科学基金青年项目 (No.61402340）;湖北省自然科学基金 (No.2014CFB194）
- DOI：10.3969/j.issn.0372-2112.2016.12.035
  中图分类号： TP391
- 纸质出版：2016
- 稿件说明：
移动端阅览
周炫余, 刘娟, 邵鹏, 等. 基于测度优化Laplacian SVM的中文指代消解方法[J]. 电子学报, 2016,44(12):3064-3072.

ZHOU Xuan-yu, LIU Juan, SHAO Peng, et al. Chinese Anaphora Resolution Based on Metric-optimized Laplacian SVM[J]. Acta Electronica Sinica, 2016, 44(12): 3064-3072.
周炫余, 刘娟, 邵鹏, 等. 基于测度优化Laplacian SVM的中文指代消解方法[J]. 电子学报, 2016,44(12):3064-3072. DOI： 10.3969/j.issn.0372-2112.2016.12.035.

ZHOU Xuan-yu, LIU Juan, SHAO Peng, et al. Chinese Anaphora Resolution Based on Metric-optimized Laplacian SVM[J]. Acta Electronica Sinica, 2016, 44(12): 3064-3072. DOI： 10.3969/j.issn.0372-2112.2016.12.035.

摘要

相比于传统的基于半监督学习的指代消解方法，Laplacian SVM（Support Vector Machine）能有效的挖掘已标注样本和未标注样本的相似性和关联性，更好的推导模型的分类边界.而传统Laplacian SVM采用欧式距离度量样本之间的距离，使得异类样本之间的相似性可能过大，不利于样本的准确分类.对此，提出一种基于数据驱动学习最优测度Laplacian SVM算法以解决中文指代消解语料不足的问题.该方法通过优化样本对之间的相似性约束条件和引入Fisher判别项，增大同类样本间的相似性，并突出强判别能力的特征.此外，提出核嵌入的测度优化方法将以上线性测度优化推广到非线性空间，有利于Laplacian SVM利用核函数实现非线性分类.在ACE2005中文语料库上的测评结果表明，所提出测度优化的Laplacian SVM（包括线性和核嵌入两种形式）的方法只需少量标注样本就可以获得与经典的有监督学习模型相当甚至更好的消解性能，同时也优于其他传统的半监督学习方法.

Abstract

Compared to the traditional semi-supervised based anaphora resolution methods

Laplacian SVM(Support Vector Machine) can efficiently explore the similarity and correlations between labeled and unlabeled samples for deriving more accurate classification model.However

traditional Laplacian SVM simply uses Euclidean distance to calculate the distance between two samples

which may result that two samples from different classes may have false high similarity.To address the problem of insufficient Chinese annotated corpus

a data-driven based method is proposed to learn the optimal distance metric.The proposed method takes similarity constraints between sample-pairs into consideration and introduces the Fisher discrimination criterion

so that the similarities of in-class samples are higher than those of between-class samples

and the discriminant features are highlighted in the new metric space.Furthermore

the proposed metric-optimized method is generalized from linear to nonlinear space by the use of kernel

so that it can be used for non-linear classification.Compared with the classical supervised method and other four traditional semi-supervised methods on the ACE2005 Chinese corpus

the proposed method

both the linear form and kernel form

achieves the comparatively better or best performance

with fewer labeled samples.

关键词

Keywords

references

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于可疑像素相互修正的半监督医学图像分割

融合引导注意力的中文长文本摘要生成

基于联邦学习的主动半监督短文本分类方法

基于传播树的多特征谣言检测方法