1.广西跨境电商智能信息处理重点实验室 (广西财经学院), 广西南宁 530003
2.广西财经学院信息与统计学院, 广西南宁 530003
[ "黄名选 男,1966年出生于广西乐业县,硕士,现为广西财经学院教授,硕士研究生导师,主要研究方向为数据挖掘、信息检索、机器学习,主持国家自然科学基金项目2项,主持完成广西自然科学基金项目1项和广西教育厅科研项目3项,获2011年广西高校优秀人才资助计划项目1项,参与完成国家自然科学基金项目1项,发表学术论文60余篇,其中,中文核心期刊论文40余篇,期刊EI收录7篇,ISTP收录1篇,发明专利授权15件.E⁃mail: mingxh05@163.com" ]
收稿:2020-07-05,
修回:2020-09-25,
纸质出版:2021-07-25
移动端阅览
黄名选.关联模式挖掘与词向量学习融合的伪相关反馈查询扩展[J].电子学报,2021,49(07):1305-1313.
HUANG Ming-xuan.Pseudo-Relevance Feedback Query Expansion Based on the Fusion of Association Pattern Mining and Word Embedding Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1305-1313.
黄名选.关联模式挖掘与词向量学习融合的伪相关反馈查询扩展[J].电子学报,2021,49(07):1305-1313. DOI: 10.12263/DZXB.20200654.
HUANG Ming-xuan.Pseudo-Relevance Feedback Query Expansion Based on the Fusion of Association Pattern Mining and Word Embedding Learning[J].ACTA ELECTRONICA SINICA,2021,49(07):1305-1313. DOI: 10.12263/DZXB.20200654.
针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习融合的伪相关反馈查询扩展模型.该模型对伪相关反馈文档集挖掘规则扩展词,对初检文档集进行词嵌入学习训练得到词向量,计算规则扩展词与原查询的向量相似度,提取向量相似度不低于阈值的规则扩展词作为最终扩展词.实验结果表明,所提扩展模型能有效地减少查询主题漂移和词不匹配问题,提高检索性能,与现有基于关联模式的和基于词向量的查询扩展方法比较,MAP(Mean Average Precision)平均增幅最大可达17.52%,对短查询更有效.所提挖掘方法可用于其他文本挖掘任务和推荐系统,以提高其性能.
In order to solve the problems of query topic drift and word mismatch in natural language processing
an algorithm of association pattern mining and rule expansion based on CSC(Copulas-based Support and Confidence) framework is proposed. The association patterns based on statistical analysis are fused with the word embedding with context semantic information
and a pseudo-relevance feedback query expansion model is presented based on the fusion of association pattern mining and word embedding learning. In this model
the rule expansion terms are mined from the pseudo-relevance feedback document set
and the word vectors are obtained by word embedding learning training of the initial document set. The vector similarity between the rule expansion term and original query is calculated
and the rule expansion terms whose vector similarity is not lower than the threshold are extracted as the final expansion terms. The experimental results show that the proposed expansion model can effectively reduce the problems of query topic drift and word mismatch
improving the performance of information retrieval. Compared with the existing query expansion methods based on association pattern and word embedding
the average increase of the MAP(Mean Average Precision)of the proposed expansion model is up to 17.52%. The expansion model in this paper is more effective for short queries. The proposed mining method can be used in other text mining tasks and recommendation systems to improve their performance.
Vaidyanathan R , Das S , Srivastava N . Query expansion strategy based on pseudo relevance feedback and term weight scheme for monolingual retrieval [J]. International Journal of Computer Applications , 2015 , 105 ( 8 ): 1 - 6 .
Keikha A , Ensan F , Bagheri E . Query expansion using pseudo relevance feedback on Wikipedia [J]. Journal of Intelligent Information Systems , 2018 , 50 ( 3 ): 455 - 478 .
Pan M , Huang J , He T , et al . A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback [J]. Journal of the Association for Information Science and Technology (JASIST) , 2020 , 71 ( 3 ): 264 - 281 .
Latiri C , Haddad H , Hamrouni T . Towards an effective automatic query expansion process using an association rule mining approach [J]. Journal of Intelligent Information Systems , 2012 , 39 ( 1 ): 209 - 247 .
Bouziri A , Latiri C , Gaussier E , et al . Learning query expansion from association rules between terms [A]. Fred A, Dietz J, Aveiro D, et al. Proceedings of the 7th International Joint Conference on Knowledge Discovery , Knowledge Engineering and Knowledge Management (IC 3 K) [C]. Lisbon,Portugal: Scitepress , 2015. 525 - 530 .
Bouziri A , Latiri C , Gaussier E . Efficient association rules selecting for automatic query expansion [A]. Gelbukh A. Proceedings of the 18th International Conference on Computational Linguistics & Intelligent Text Processing [C]. Budapest, Hungary : Springer , 2017 . 563 - 574 .
Bouziri A , Latiri C , Gaussier E . LTR-expand: Query Expansion Model Based on Learning to Rank Association Rules [EB/OL]. https://doi.org/10.1007/s10844-020-00596-8 https://doi.org/10.1007/s10844-020-00596-8 , 2020 . 03. 21 / 2020 .08. 15 .
Jabri S , Dahbi A , Gadi T , et al . Improving retrieval performance based on query expansion with Wikipedia and text mining technique [J]. International Journal of Intelligent Engineering & Systems , 2018 , 11 ( 4 ): 283 - 292 .
Jabri S , Dahbi A , Gadi T . A graph-based approach for text query expansion using pseudo relevance feedback and association rules mining [J]. International Journal of Electrical&Computer Engineering , 2019 , 9 ( 6 ): 5016 - 5023 .
黄名选 , 严小卫 , 张师超 . 基于矩阵加权关联规则挖掘的伪相关反馈查询扩展 [J]. 软件学报 , 2009 , 20 ( 7 ): 1854 - 1865 .
Huang MX , Yan XW , Zhang SC . Query expansion of pseudo relevance feedback based on matrix-weighted association rules mining [J]. Journal of Software , 2009 , 20 ( 7 ): 1854 - 1865 . (in Chinese)
黄名选 . 完全加权模式挖掘与相关反馈融合的印尼汉跨语言查询扩展 [J]. 小型微型计算机系统 , 2017 , 38 ( 8 ): 1783 - 1791 .
HUANG Ming-xuan . Indonesian-Chinese cross language query expansion based on all-weighted patterns mining and relevance feedback [J]. Journal of Chinese Computer Systems , 2017 , 38 ( 8 ): 1783 - 1791 . (in Chinese)
黄名选 . 基于加权关联模式挖掘的越-英跨语言查询扩展 [J]. 情报学报 , 2017 , 36 ( 3 ): 307 - 318 .
HUANG Ming-xuan . Vietnamese-English cross language query expansion based on weighted association patterns mining [J]. Journal of the China Society for Scientific and Technical Information , 2017 , 36 ( 3 ): 307 - 318 . (in Chinese)
黄名选 , 蒋曹清 . 基于项权值排序挖掘的跨语言查询扩展 [J]. 电子学报 , 2020 , 48 ( 3 ): 568 - 576 .
HUANG Ming-xuan , JIANG Cao-qing . Cross language query expansion based on item weight sorting mining [J]. Acta Electronica Sinica , 2020 , 48 ( 3 ): 568 - 576 . (in Chinese)
黄名选 , 蒋曹清 . 基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展 [J]. 电子学报 , 2018 , 46 ( 12 ): 3029 - 3036 .
HUANG Ming-xuan , JIANG Cao-qing . Vietnamese-Eng- lish cross language query post-translation expansion based on all-weighted positive and negative association patterns mining [J]. Acta Electronica Sinica , 2018 , 46 ( 12 ): 3029 - 3036 . (in Chinese)
Zhang H R , Zhang J W , Wei X Y , et al . A new frequent pattern mining algorithm with weighted multiple minimum supports [J]. Intelligent Automation & Soft Computing , 2017 , 23 ( 4 ): 605 - 612 .
Roy D , Ganguly D , Mitra M , et al . Word vector compositionality based relevance feedback using kernel density estimation [A]. Proceedings of the 25th ACM International Conference on Information and Knowledge Management [C]. New York, USA : ACM Press , 2016 . 1281 - 1290 .
Kuzi S , Shtok A , Kurland O . Query expansion using word embeddings [A]. Proceedings of the 25th ACM International Conference on Information and Knowledge Management [C]. New York, USA : ACM Press , 2016 . 1929 - 1932 .
许侃 , 林原 , 曲忱 , 等 . 专利查询扩展的词向量方法研究 [J]. 计算机科学与探索 , 2018 , 12 ( 6 ): 972 - 980 .
XU Kan , LIN Yuan , QU Chen , et al . Research on patent query expansion methods using word embedding [J]. Journal of Frontiers of Computer Science and Technology , 2018 , 12 ( 6 ): 972 - 980 . (in Chinese)
Sklar A . Fonctions de repartition à n dimensions et leurs marges [J]. Publication de l’Institut de Statistique l’Universite Paris , 1959 , 8 ( 1 ): 229 - 231 .
Eickhoff C , De Vries A P , Collins-Thompson K . Copulas for information retrieval [A]. Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'13) [C]. New York, USA : ACM Press , 2013 . 663 - 672 .
张书波 , 张引 , 张斌 , 等 . 基于Copulas框架的混合式查询扩展方法 [J]. 计算机科学 , 2016 , 43 ( 6A ): 485 - 488 496 .
ZHANG Shu-bo , ZHANG Yin , ZHANG Bin , et al . Combined query expansion method based on copulas framework [J]. Computer Science , 2016 , 43 ( 6A ): 485 - 488, 496 . (in Chinese)
Nelson R B . An Introduction to Copulas(Second Edition) [M]. New York , USA: Springer Science+Business Media , Inc, 2006 . 17 - 22 .
Mikolov T , Chen K , Corradog G , et al . Efficient Estimation of Word Representations in Vector Space [EB/OL]. https://arxiv.org/pdf/1301.3781v3.pdf. arXiv:1301.3781v3 https://arxiv.org/pdf/1301.3781v3.pdf.arXiv:1301.3781v3
cs . CL] 7 Sep 2013/2020 . 08 . 15 .
Mikolov T , Sutskever I , Chen K , et al . Distributed representations of words and phrases and their compositionality [A]. Burges C J C, Bottou L, Welling M. Proceedings of Advances in Neural Information Processing Systems(NIPS 2013) [C]. New York, USA : Curran Associates Inc , 2013 . 3111 - 3119 .
Pennington J , Socher R , Manning C . Glove: Global vectors for word representation [A]. Moschitti A, Pang B, Daelemans W. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2014) [C]. Doha, Qatar : Association for Computational Linguistics , 2014 . 1532 - 1543 .
张剑 , 屈丹 , 李真 . 基于词向量特征的循环神经网络语言模型 [J]. 模式识别与人工智能 , 2015 , 28 ( 4 ): 299 - 305 .
ZHANG Jian , QU Dan , LI Zhen . Recurrent neural network language model based on word vector features [J]. PR& AI , 2015 , 28 ( 4 ): 299 - 305 . (in Chinese)
Devlin J , Chang M W , Lee K , et al . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [EB/OL]. https://arxiv.org/pdf/1810.04805.pdf https://arxiv.org/pdf/1810.04805.pdf , arXiv: 1810 . 04805 v
cs . CL] 24 May 2019/2020 . 08 . 15 .
0
浏览量
21
下载量
2
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621