

浏览全部资源
扫码关注微信
1.北京理工大学计算机学院北京市海量语言信息处理与云计算应用工程技术研究中心,北京 100081
2.北京理工大学东南信息技术研究院,福建莆田 351100
Received:23 December 2019,
Revised:2020-11-24,
Published:25 September 2021
移动端阅览
孙新,盖晨,申长虹等.基于短语向量和主题加权的关键词抽取方法[J].电子学报,2021,49(09):1682-1690.
SUN Xin,GE Chen,SHEN Chang-hong,et al.The Theme-Weighted Keyphrase Extraction Algorithm Based on Phrase Embedding[J].ACTA ELECTRONICA SINICA,2021,49(09):1682-1690.
孙新,盖晨,申长虹等.基于短语向量和主题加权的关键词抽取方法[J].电子学报,2021,49(09):1682-1690. DOI: 10.12263/DZXB.20200014.
SUN Xin,GE Chen,SHEN Chang-hong,et al.The Theme-Weighted Keyphrase Extraction Algorithm Based on Phrase Embedding[J].ACTA ELECTRONICA SINICA,2021,49(09):1682-1690. DOI: 10.12263/DZXB.20200014.
现有关键词抽取算法缺乏对短语的有效表示,为抽取出更能反映文本主题的关键短语,本文提出一种基于短语向量的关键词抽取方法PhraseVecRank.首先设计基于LSTM(Long Short-Term Memory)和CNN(Convolutional Neural Network)自编码器的短语向量构建模型,解决复杂短语的语义表示问题.然后,利用短语向量对每个候选短语计算主题权重,通过主题加权排序提高关键词抽取的效果.在公共数据集和学术论文数据上的实验表明,本文提出的方法能够有效提取与文本主题信息相关的关键短语,同时利用自编码器构造的短语向量可以更好地表示短语的语义信息.
Keyword extraction is a key basic problem in the field of natural language processing. The keyphrase extraction algorithms(PhraseVecRank) is proposed based on phrase embedding. Firstly
a phrase vector construction model based on LSTM(Long Short-Term Memory) and CNN(Convolutional Neural Network) is designed to solve the semantic representation of complex phrases. Then
PhraseVecRank uses phrase embedding to calculate theme weight for each candidate phrase
and uses semantic similarity between candidate phrase embedding and co-occurrence information to calculate edge weight together
which can improve the extraction effect of keyphrases through topic weighted ranking. The experimental results verify that PhraseVecRank can effectively extract keyphrases covering the topic information of text
and the phrase embedding models we proposed can better represent the semantic information of phrases.
Papagiannopoulou E , Tsoumakas G . A review of keyphrase extraction [J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 2020 , 10 ( 2 ): e1339 .
刘慧婷 , 刘志中 , 王利利 , 等 . 一般间隙序列模式挖掘的关键词抽取 [J]. 电子学报 , 2019 , 47 ( 5 ): 1121 - 1128 .
Liu H T , Liu Z Z , Wang L L , et al . Keyphrase extraction using sequential patterns mining algorithm with one-off and general gaps condition [J]. Acta Electronica Sinica , 2019 , 47 ( 5 ): 1121 - 1128 . (in Chinese)
Mihalcea R , Tarau P . TextRank: Bringing order into texts [A]. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing [C]. Barcelona, Spain : ACL , 2004 . 404 - 411 .
Wan X J , Xiao J G . Single document keyphrase extraction using neighborhood knowledge [A]. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence [C]. Seattle, Washington : AAAI Press , 2008 . 855 - 860 .
Liu Z , Huang W , Zheng Y , Sun M . Automatic keyphrase extraction via topic decomposition [A]. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing . Massachusetts [C]. Cambridge, MA : ACL , 2010 . 366 - 376 .
Florescu C , Caragea C . PositionRank: An unsupervised approach to keyphrase extraction from scholarly documents [A]. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics [C]. Vancouver, Canada : ACL , 2017 . 1105 - 1115 .
马慧芳 , 刘芳 , 夏琴 , 等 . 基于加权超图随机游走的文献关键词提取算法 [J]. 电子学报 , 2018 , 46 ( 6 ): 1410 - 1414 .
Ma H F , Liu F , Xia Q , et al . Keywords extraction algorithm based on weighted hypergraph random walk [J]. Acta Electronica Sinica , 2018 , 46 ( 6 ): 1410 - 1414 . (in Chinese)
Bojanowski P , Grave E , Joulin A , et al . Enriching word vectors with subword information [J]. Transactions of the Association for Computational Linguistics , 2017 , 5 : 135 - 146 .
Sun Y , Qiu H P , Zheng Y , et al . SIFRank: A new baseline for unsupervised keyphrase extraction based on pre-trained language model [J]. IEEE Access , 2020 , 8 : 10896 - 10906 .
Peters M , Neumann M , Iyyer M , et al . Deep contextualized word representations [A]. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies [C]. New Orleans, Louisiana : ACL , 2018 . 2227 - 2237 .
Salton G , Buckley C . Term-weighting approaches in automatic text retrieval [J]. Information Processing & Management , 1988 , 24 ( 5 ): 513 - 523 .
Li P , Liu Y , Sun M . Recursive autoencoders for ITG-based translation [A]. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing [C]. Seattle : ACL , 2013 . 567 - 577 .
Witten I H , Paynter G W , Frank E , et al . KEA: practical automatic keyphrase extraction [A]. Proceedings of the fourth ACM conference on Digital Libraries [C]. Berkeley : ACM , 1999 . 254 - 255 .
Medelyan O , Frank E , Witten I H . Human-competitive tagging using automatic keyphrase extraction [A]. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing [C]. Singapore : ACL , 2009 . 1318 - 1327 .
Meng R , Zhao S Q , Han S G , et al . Deep keyphrase generation [A]. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics [C]. Vancouver, Canada : ACL , 2017 . 582 - 592 .
0
Views
16
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621