基于词聚类特征的统计中文组块分析模型

孙广路; 王晓龙; 刘秉权; 关 毅

您当前的位置：

首页 >

文章列表页 >

基于词聚类特征的统计中文组块分析模型

论文 | 更新时间：2025-07-16

- 基于词聚类特征的统计中文组块分析模型
- Statistical Chinese Chunking Model Based on Word Clustering Features
- 电子学报 2008年36卷第12期页码：2450-2453
- 作者机构：
  
  1. 哈尔滨工业大学计算机科学与技术学院,黑龙江,哈尔滨,150001
  2. 哈尔滨理工大学计算机科学与技术学院,黑龙江,哈尔滨,150080
  3. 哈尔滨工业大学计算机科学与技术学院黑龙江哈尔滨,150001
  4. 哈尔滨理工大学计算机科学与技术学院黑龙江哈尔滨,150080
- 作者简介：
- 基金信息：
  
  国家自然科学基金 (No.60435020;No.60673037);国家863项目 (No.2006AA01Z197;No.2007AA01Z172) *金本位是指由人工标记的被认为没有标记错误的语料.
- DOI：
  中图分类号： TP391.2
- 纸质出版：2008
- 稿件说明：
移动端阅览
孙广路, 王晓龙, 刘秉权, 等. 基于词聚类特征的统计中文组块分析模型[J]. 电子学报, 2008,36(12):2450-2453.

SUN Guang-lu, WANG Xiao-long, LIU Bing-quan, et al. Statistical Chinese Chunking Model Based on Word Clustering Features[J]. Acta Electronica Sinica, 2008, 36(12): 2450-2453.
孙广路, 王晓龙, 刘秉权, 等. 基于词聚类特征的统计中文组块分析模型[J]. 电子学报, 2008,36(12):2450-2453. DOI：

SUN Guang-lu, WANG Xiao-long, LIU Bing-quan, et al. Statistical Chinese Chunking Model Based on Word Clustering Features[J]. Acta Electronica Sinica, 2008, 36(12): 2450-2453. DOI：

摘要

提出了一种基于信息熵的层次词聚类算法

并将该算法产生的词簇作为特征应用到中文组块分析模型中.词聚类算法基于信息熵的理论

利用中文组块语料库中的词及其组块标记作为基本信息

采用二元层次聚类的方法形成具有一定句法功能的词簇.在聚类过程中

设计了优化算法节省聚类时间.用词簇特征代替传统的词性特征应用到组块分析模型中

并引入名实体和仿词识别模块

在此基础上构建了基于最大熵马尔科夫模型的中文组块分析系统.实验表明

本文的算法提升了聚类效率

产生的词簇特征有效地改进了中文组块分析系统的性能.

Abstract

An entropy-based hierarchical word clustering algorithm is proposed.Word clusters generated by the clustering algorithm were used as features in Chinese chunking model.Based on words' chunk tags and the theory of entropy

a binary hierarchical clustering algorithm was applied to the words in Chinese chunking corpus.An accelerating algorithm was employed to save the clustering time.With the recognition of name entity and factoid

the new Chinese chunking system was constructed based on maximum entropy Markov models

while part-of-speech features were replaced with the entropy-based word clustering features.Experimental results show that the algorithm increases the efficiency of the word clustering

and the entropy-based word clustering features improve the performance of Chinese chunking effectively.

关键词

Keywords

references

浏览量

1195

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于改进关键帧选取策略的快速PL-SLAM算法

基于信息熵的改进狮群算法及其在组合优化中的应用

不一致决策系统中约简之间的比较

新的证据冲突衡量标准下的D-S改进算法

领域术语自动抽取及其在文本分类中的应用