Vietnamese-English Cross Language Query Post-Translation Expansion Based on All-Weighted Positive and Negative Association Patterns Mining
HUANG Ming-xuan1,2, JIANG Cao-qing1,2
1. Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing, Guangxi University of Finance and Economics, Nanning, Guangxi 530003, China;
2. School of Information and Statistics, Guangxi University of Finance and Economics, Nanning, Guangxi 530003, China
Abstract:Topic drift and word mismatch are a difficult problem in natural language processing.The combination of text mining and information retrieval can help to solve the problem.In view of this,this paper proposes an algorithm of Vietnamese-English cross language (VECL) query post-translation expansion based on all-weighted positive and negative association pattern mining.The algorithm utilized a computing method of support and correlation degree of all-weighted positive and negative itemset,and mined the all-weighted positive and negative association pattern related to the original query by the pattern evaluation framework in the user relevance feedback document set from the VECL first retrieval results.The expansion terms were extracted from the patterns in order to carry out VECL query post-translation expansion.A comparison between the proposed algorithm and the existing cross language query expansion algorithms based on pseudo relevance feedback and weighted association pattern mining is made,which shows that the former can effectively reduce the problems of query topic drift and word mismatch,and improve the performance of cross language information retrieval.And moreover,the method of pattern mining in this paper can be used in recommender systems and improve its accuracy.
[1] Gaillard B,Bouraoui J L,Neef E G D,et al.Query expansion for cross language information retrieval improvement[A].Proceedings of the Fourth IEEE International Conference on Research Challenges in Information Science[C].Nice,France:IEEE,2010.337-342.
[2] 魏露,李书琴,等.跨语言查询扩展优化[J].计算机工程与设计,2014,35(8):2785-2803. WEI Lu,LI Shu-qin,et al.Optimization of cross-language query expansion[J].Computer Engineering and Design,2014,35(8):2785-2803.(in Chinese)
[3] Cao G,Gao J,Nie J Y,et al.Extending query translation to cross-language query expansion with Markov chain models[A].Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management[C].New York,NY,USA:ACM,2007.351-360.
[4] Agrawal A,Agrawal D A J.Improving performance of Hindi-English based cross language information retrieval using selective documents technique and query expansion[J].International Journal of Science and Research,2016,5(5):1964-1967.
[5] Bellaachia A,AmorTijani G.Enhanced query expansion in English-Arabic CLIR[A].Proceedings of the 19th International Conference on Database and Expert Systems Application[C].Washington,DC,USA:IEEE Computer Society,2008.61-66.
[6] Chinnakotla M K,Raman K,Bhattacharyya P.Multilingual pseudo-relevance feedback:performance study of assisting languages[A].Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics[C].Stroudsburg,PA,USA:ACL,2010.1346-1356.
[7] Tang P,Zhao J,Yu Z,et al.A method of Chinese and Thai cross-lingual query expansion based on comparable corpus[J].Journal of Information Processing Systems,2017,13(4):805-817.
[8] Chandra G,Dwivedi S K.Query expansion based on term selection for Hindi-English cross lingual IR[J].Journal of King Saud University-Computer and Information Sciences,2017,29(1):1-10.
[9] Geraldo A P,Moreira V P.UFRGS@CLEF2008:using association rules for cross-language information retrieval[A].Proceedings of the 9th Cross-Language Evaluation Forum Conference on Evaluating Systems for Multilingual and Multimodal Information Access[C].Berlin,Germany:Springer-Verlag,2009.66-74.
[10] 黄名选.基于加权关联模式挖掘的越-英跨语言查询扩展[J].情报学报,2017,36(3):307-318. HUANG Ming-xuan.Vietnamese-English cross language query expansion based on weighted association patterns mining[J].Journal of the China Society for Scientific and Technical Information,2017,36(3):307-318.(in Chinese)
[11] 黄名选.完全加权模式挖掘与相关反馈融合的印尼汉跨语言查询扩展[J].小型微型计算机系统,2017,38(8):1783-1791. HUANG Ming-xuan.Indonesian-Chinese cross language query expansion based on all-weighted patterns mining and relevance feedback[J].Journal of Chinese Computer Systems,2017,38(8):1783-1791.(in Chinese)
[12] Ballesteros L,Croft W B.Phrasal translation and query expansion techniques for cross-language information retrieval[A].Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York,NY,USA:ACM,1997.84-91.
[13] Mcnamee P,Mayfield J.Comparing cross-language query expansion techniques by degrading translation resources[A].Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C].New York,NY,USA:ACM,2002.159-166.
[14] 吴丹,何大庆,王惠临.基于伪相关反馈的跨语言查询扩展[J].情报学报,2010,29(2):232-239. WU Dan,HE Daging and WANG Huilin.Cross language query expansion using pseudo relevance feedback[J].Journal of the China Society for Scientific and Technical Information,2010,29(2):232-239.(in Chinese)
[15] Cai C H,Da A,Fu W C,et al.Mining association rules with weighted items[A].Proceedings of the 1998 International Symposium on Database Engineering & Applications[C].Washington,DC,USA:IEEE Computer Society,1998.68-77.
[16] 周秀梅,黄名选.基于项权值变化的完全加权正负关联规则挖掘[J].电子学报,2015,43(8):1545-1554. ZHOU Xiu-mei,HUANG Ming-xuan.All-weighted positive and negative association rules mining based on dynamic item weight[J].Acta Electronica Sinica,2015,43(8):1545-1554.(in Chinese)
[17] 周秀梅,黄名选.基于项权值变化的矩阵加权关联规则挖掘[J].计算机应用研究,2015,32(10):2918-2923. ZHOU Xiu-mei,HUANG Ming-xuan.Matrix-weighted association rules mining based on dynamic weight of item[J].Application Research of Computers,2015,32(10):2918-2923.(in Chinese)
[18] Salton G,Buckley C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1988,24(5):513-523.