Combining Coupled Distance Discrimination and Strong Classifica-tion Features for Short Text Similarity Calculation

doi:10.3969/j.issn.0372-2112.2019.06.021

您当前的位置：

首页 >

文章列表页 >

Combining Coupled Distance Discrimination and Strong Classifica-tion Features for Short Text Similarity Calculation

更新时间：2025-07-02

- Combining Coupled Distance Discrimination and Strong Classifica-tion Features for Short Text Similarity Calculation
- Acta Electronica Sinica Vol. 47, Issue 6, Pages: 1331-1336(2019)
- 作者机构：
  
  1. 西北师范大学计算机科学与工程学院,甘肃,兰州,730000
  2. 桂林电子科技大学广西可信软件重点实验室,广西,桂林,541004
  3. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
  4. 西北师范大学计算机科学与工程学院,甘肃,兰州,730000
  5. 桂林电子科技大学广西可信软件重点实验室,广西,桂林,541004
  6. 广西师范大学广西多源信息挖掘与安全重点实验室,广西,桂林,541004
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China (No.61762078, No.61363058, No.61663004);Supproted by Guangxi Key Laboratory Foundation for Multi-source Information Mining and Security (No.MIMS18-08);Research Project of Guangxi Key Laboratory of Trusted Software (No.KX201705)
- DOI：10.3969/j.issn.0372-2112.2019.06.021
  CLC： TP393.092
- Published Online：25 June 2019，
  
  Published：2019
- 稿件说明：
移动端阅览
Combining Coupled Distance Discrimination and Strong Classifica-tion Features for Short Text Similarity Calculation[J]. Acta Electronica Sinica, 2019, 47(6): 1331-1336.
DOI：

Combining Coupled Distance Discrimination and Strong Classifica-tion Features for Short Text Similarity Calculation[J]. Acta Electronica Sinica, 2019, 47(6): 1331-1336. DOI： 10.3969/j.issn.0372-2112.2019.06.021.

摘要

短文本相似度计算在社会网络、文本挖掘和自然语言处理等领域中起着至关重要的作用.针对短文本内容简短、特征稀疏等特点，以及传统的短文本相似度计算忽略类别信息等问题，提出一种融合耦合距离区分度和强类别特征的短文本相似度计算方法.一方面，在整个短文本语料库中利用两个共现词之间的距离计算词项共现距离相关度，并以此来对词项加权从而捕获词项间内联和外联关系，得到短文本的耦合距离区分度相似度；另一方面，基于少量带类别标签的监督数据提取每类中强类别区分能力的特征项作为强类别特征集合，并利用词项的上下文来对强类别特征语义消歧，然后基于文本间包含相同类别的强类别特征数量来衡量文本间的相似度.最后，本文结合耦合距离区分度和强类别特征来衡量短文本的相似度.经实验证明本文提出的方法能够提高短文本相似度计算的准确率.

Abstract

Text similarity measures play a vital role in text related applications in tasks such as social networks

text mining

natural language processing

and others.The typical characteristics of short texts demonstrate severe sparseness and high dimension while the traditional short texts similarity calculation always ignores category information.A coupled distance discrimination and strong classification features based approach for short text similarity calculation

CDDCF

is presented.On the one hand

co-occurrence distance between terms are considered in each text to determine the co-occurrence distance correlation

based on which the weight for each term can be determined and the intra and inter relations between words are established.The similarity of coupling distance discrimination on short text can be captured.On the other hand

strong classification features are extracted via labeled texts.The similarity between two short texts is measured by using the common number of strong discrimination features with the same context.Finally

the distance discrimination and strong classification features are unified into a joint framework to measure the similarity of short texts.Experimental results show that CDDCF performs better compared to baseline algorithms in term of its performance and efficiency of similarity computation.

关键词

Keywords

references

Views

286

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Pseudo-Relevance Feedback Query Expansion Based on the Fusion of Association Pattern Mining and Word Embedding Learning

Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining

Cross Language Query Expansion Based on Item Weight Sorting Mining

Research on Dynamic Continuous Emotional Recognition of EEG Based on Improved TCNN Algorithm

Chinese Long Text Summarization with Guided Attention

Related Author

HUANG Ming-xuan

JIANG Cao-qing

HUANG Ming-xuan

JIANG Cao-qing

JIE Li-lin

LIU Yong

WANG Ming-xun

Related Institution

Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing， Guangxi University of Finance and Economics

School of Information and Statistics， Guangxi University of Finance and Economics

Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics

School of Information and Statistics, Guangxi University of Finance and Economics

Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing Guangxi University of Finance and Economics Nanning Guangxi China

⁰