National Natural Science Foundation of China (No.61762078, No.61363058, No.61762079, No.61762080);Open Project of Key Laboratory of Intelligent Information Processing of Institute of Computing Technology, CAS (No.IIP2014-4);Research Project of Guangxi Key Laboratory of Trusted Software (No.kx201705)
It is difficult for the existing natural language processing algorithms to model the time and authority of short texts such as paper titles of scientific literature.Besides
the short texts always tend to have fewer words and thus suffer from high dimension and sparsity.A keyword extraction method involving both real-time and authoritativeness is presented.A weighted hyper-graph is constructed where vertexes represent weighted terms and weighted hyper-edges measure the semantic relatedness of both binary relations and nary relations among terms.On one hand
the source of the documents
the year of publication and number of citations are considered for weighting hyper-edges
on the other hand
the degree of association between the nodes and co-occurrence distance for each pair of nodes in particular title are calculated for weighting hyper-vertexes.The random walk approach is performed on the weighted hyper-graph to obtain the recommended keywords.Experimental results demonstrated that compared with three baseline algorithms
the proposed approach is able to extract keywords with higher precision and recall.