Abstract:It is difficult for the existing natural language processing algorithms to model the time and authority of short texts such as paper titles of scientific literature.Besides,the short texts always tend to have fewer words and thus suffer from high dimension and sparsity.A keyword extraction method involving both real-time and authoritativeness is presented.A weighted hyper-graph is constructed where vertexes represent weighted terms and weighted hyper-edges measure the semantic relatedness of both binary relations and nary relations among terms.On one hand,the source of the documents,the year of publication and number of citations are considered for weighting hyper-edges,on the other hand,the degree of association between the nodes and co-occurrence distance for each pair of nodes in particular title are calculated for weighting hyper-vertexes.The random walk approach is performed on the weighted hyper-graph to obtain the recommended keywords.Experimental results demonstrated that compared with three baseline algorithms,the proposed approach is able to extract keywords with higher precision and recall.
[1] Ma Hui-fang,Xing Yu-ying,et al.Leveraging term co-occurrence distance and strong classification features for short text feature selection[A].Proceedings of the 10th International Conference on Knowledge Science,Engineering and Management[C].Melbourne,Australia:Springer,2017.303-310.
[2] 刘喜平,万常选等.空间关键词搜索研究综述[J].软件学报,2016,27(2):329-347. Liu Xi-ping,Wan Chang-xuan,et al.Survey on spatial keyword search[J].Journal of Software,2016,27(2):329-347.(in Chinese)
[3] Ugo Erra,Sabrina Senatore,et al.Approximate TF-IDF based on topic extraction from massive message stream using the GPU[J].Information Sciences,2015,292(20):143-161.
[4] Song Shao-xu,Zhu Han,et al,Probabilistic correlation-based similarity measure on text records[J].Information Sciences,2014,289(1):8-24.
[5] Hua Wen,Wang Zhong-yuan,et al.Short text understanding through lexical-semantic analysis[A].Proceedings of the 31st International Conference on Data Engineering[C].Seoul,South Korea:IEEE,2015.495-506.
[6] Blei D M,Ng A Y,Jordan M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022.
[7] Rahim Saeidi,Ramon Fernandez Astudillo,et al.Uncertain LDA:Including observation uncertainties in discriminative transforms[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(7):1479-1488.
[8] Willyan D Abilhoa,Leandro N de Castro.A keyword extraction method from twitter messages represented as graphs[J].Applied Mathematics and Computation,2014,240(4):308-325.
[9] Zhou Deng-yong,Huang Jia-yuan,et al.Learning with hypergraphs:clustering,classification,and embedding[A].Proceedings of the 20th International Conference on Neural Information Processing Systems[C].Vancouver,Canada:MIT Press,2006.1601-1608.
[10] Li De-cong,Li Su-jian.Hypergraph-based inductive learning for generating implicit key phrases[A].Proceedings of the 20th International Conference on World Wide Web[C].Hyderabad,India:Springer,2011.77-78.
[11] Bellaachia A,Al-Dhelaan M.HG-Rank:A hypergraph-based keyphrase extraction for short documents in dynamic genre[A].Proceedings of the 4th Workshop on Making Sense of Microposts[C].Seoul,Korea:CEUR-WS,2014.42-49.