CIE Homepage  |  Join CIE  |  Login CIE  |  中文 

Collections

Natural Language Processing: Technology and Application
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • SURVEYS AND REVIEWS
    QIU Yun-qi, WANG Yuan-zhuo, BAI Long, YIN Zhi-yi, SHEN Hua-wei, BAI Shuo
    Acta Electronica Sinica. 2022, 50(9): 2242-2264. https://doi.org/10.12263/DZXB.20220212
    Abstract (1677) Download PDF (1830) HTML (1078)   Knowledge map   Save
    CSCD(2)

    Knowledge base question answering(KBQA) provides accurate and short answers to complex factoid questions with the help of high-precision and highly relevant structured knowledge in the knowledge base(KB). Semantic parsing has become one of the mainstream methods of KBQA. Under the given form of question meaning representation, this kind of method maps unstructured questions into structured meaning representations, and then rewrites them as KB queries to obtain answers. At present, semantic parsing for KBQA mainly faces three challenges: first how to choose a suitable meaning representation form to express the semantics of questions, then how to parse the complex semantics of questions and output the corresponding meaning representations, and finally how to deal with the high cost of labeling datasets and the lack of annotated data in specific domains. Starting from the above challenges, this paper first analyzed the characteristics and shortcomings of meaning representations commonly used in KBQA and then combed out how existing methods deal with the complex semantics of questions. After that, this paper introduced the current attempts in low-resource scenarios and finally discussed the future directions of semantic parsing for KBQA.

  • PAPERS
    ZHANG Zhi-chang, YU Pei-lin, PANG Ya-li, ZHU Lin, ZENG Yang-yang
    Acta Electronica Sinica. 2022, 50(8): 1851-1858. https://doi.org/10.12263/DZXB.20201463

    Dialogue state tracking is an important module of task-oriented dialogue system. Previous studies exploited the historical dialogue information by attention-based graph structure simulation, but these methods cannot explicitly take advantage of the structure of the dialogue state. In addition, how to generate complex format dialogue states also brings challenges to research. In this paper, we propose a state memory graph network(SMGN). The network saves historical information through the state memory graph, and uses the graph to interact with the current dialogue. We also implement a complex dialogue state generation method based on state memory graph. Experimental results show that the proposed method improves the joint accuracy by 1.39% on the CrossWOZ dataset and 1.86% on the MultiWOZ dataset.

  • PAPERS
    ZHANG Ya-zhou, YU Yang, ZHU Shao-lin, CHEN Rui, RONG Lu, LIANG Hui
    Acta Electronica Sinica. 2022, 50(8): 1885-1893. https://doi.org/10.12263/DZXB.20211075

    Dialogue sarcasm recognition has been a challenging artificial intelligence(AI) research topic, aiming to discover elusive ironic, contemptuous and metaphoric information implied in daily dialogue. From the perspective of emotional logic, most existing works are insufficient to measure the intrinsic uncertainty in emotional expression and understanding. In view of the advantages of quantum probability(QP) in modeling the uncertainty, this paper explores the potential of QP in dialogue sarcasm recognition and proposes a quantum probability inspired network(QPIN). Specially, QPIN consists of a complex-valued embedding layer, a quantum composition layer, a quantum measurement layer and a dense layer. Each utterance is treated as a quantum superposition-like of a set of basis words, using a complex-valued representation. The contextual interaction between adjacent utterances is described as the composition system between a quantum system and its surrounding environment, which is represented by the density matrix. A quantum measurement is performed on the density matrix of each utterance to extract sarcastic features, and thus feeds these features to a dense layer to yield the probabilistic outcomes. Extensive experiments are conducted on two benchmark datasets, and the results show that our model outperforms the state-of-the-art baselines, with accuracy scores enhanced by 5.2% and 2.38%, respectively.

  • PAPERS
    SUN Xin, GE Chen, SHEN Chang-hong, ZHANG Ying-jie
    Acta Electronica Sinica. 2021, 49(9): 1682-1690. https://doi.org/10.12263/DZXB.20200014
    Abstract (500) Download PDF (1527) HTML (203)   Knowledge map   Save
    CSCD(1)

    Keyword extraction is a key basic problem in the field of natural language processing. The keyphrase extraction algorithms(PhraseVecRank) is proposed based on phrase embedding. Firstly, a phrase vector construction model based on LSTM(Long Short-Term Memory) and CNN(Convolutional Neural Network) is designed to solve the semantic representation of complex phrases. Then, PhraseVecRank uses phrase embedding to calculate theme weight for each candidate phrase, and uses semantic similarity between candidate phrase embedding and co-occurrence information to calculate edge weight together, which can improve the extraction effect of keyphrases through topic weighted ranking. The experimental results verify that PhraseVecRank can effectively extract keyphrases covering the topic information of text, and the phrase embedding models we proposed can better represent the semantic information of phrases.

  • CORRESPONDENCE
    YE Jun-min, LUO Da-xiong, CHEN Shu
    Acta Electronica Sinica. 2021, 49(2): 401-407. https://doi.org/10.12263/DZXB.20200448
    Redundant expressions, misuse of words, and missing content and other text errors can seriously affect the interpretation of text semantics. There exist two major problems with current text error correction models: The Encoder-Decoder based text error correction models have slow decoding speed; text error detection task and text correction task are handled as two separate tasks. Hence, a text error correction model based on a hierarchical editing framework is proposed in this paper. Firstly, a variety of text semantic representations are obtained through modelling pre-trained model. Secondly, text errors are located by using these text semantic representations. Finally, on the basis of hierarchical editing framework, precise editing operations are worked out to edit the errors. Experiments on the published text error correction dataset show that the proposed model has faster decoding speed and higher recall rate than comparison models.
  • ZHANG Yang-sen, ZHOU Wei-xiang, ZHANG Yu-yao, WU Yun-fang
    Acta Electronica Sinica. 2020, 48(9): 1720-1728. https://doi.org/10.3969/j.issn.0372-2112.2020.09.008
    CSCD(1)
    Identification of network negative news has important research significance in network public opinion monitoring. Aiming at the problem that negative news is difficult to detect under the current mass data, this paper proposes a method of negative news recognition based on emotional computing and hierarchical multi-head attention mechanism. Firstly, this paper uses TFIDF (Term Frequeney-Inverse Document Frquency) and emotional similarity algorithm to construct negative news emotional lexicon from news texts. Secondly, this paper uses the method of emotional tendency calculation to calculate the degree of emotional tendency of negative news affective words. Finally, the model vectorizes the emotional tendencies of words and expressions, and use hierarchical multi-attention model to judge the positive and negative emotions of news. The introduction of emotional computing and multi-attention mechanism is of great help in capturing emotional words in texts. Finally, this paper compares the real network news text data with many existing algorithms, and proves that the model has a good recognition effect. Compared with the Han model and LSTM model, it is increased by 0.67% and 3.29% respectively.
  • YANG Qi-meng, YU Long, TIAN Sheng-wei, Aishan Wumaier
    Acta Electronica Sinica. 2020, 48(6): 1077-1083. https://doi.org/10.3969/j.issn.0372-2112.2020.06.005
    CSCD(2)
    Deep neural network models for Uyghur personal pronouns resolution learn semantic information for current anaphora chain, but ignore the long-term effects of single anaphora chain recognition results. This paper proposes a Uyghur personal pronoun anaphora resolution based on deep reinforcement learning. This method defines the anaphora resolution task as the sequential decision process under the reinforcement learning environment, and effectively uses the antecedent information in the previous state to determine the current personal pronoun-candidate antecedent pairs. In this study, we use an overall reward signal optimization strategy, which is more efficient than directly using the loss function heuristic to optimize a specific single decision. Finally, we conduct experiments in the Uyghur dataset. The experimental results show that the F value of this method in the Uyghur personal pronouns resolution task is 85.80%. The experimental results show that the deep reinforcement learning model can significantly improve the performance of the Uyghur personal pronouns resolution.
  • WU Yu-jia, LI Jing, SONG Cheng-fang, CHANG Jun
    Acta Electronica Sinica. 2020, 48(2): 279-284. https://doi.org/10.3969/j.issn.0372-2112.2020.02.008
    CSCD(6)
    The existing text classification methods based on deep learning do not consider the importance and association of text features. The association between the text features perhaps affects the accuracy of the classification. To solve this problem, in this study, a framework based on high utility neural networks (HUNN) for text classification were proposed. Which can effectively mine the importance of text features and their association. Mining high utility itemsets (MHUI) from databases is an emerging topic in data mining. It can mine the importance and the co-occurrence frequency of each feature in the dataset.The co-occurrence frequency of the feature reflects the association between the text features. Using MHUI as the mining layer of HUNN, it is used to mine strong importance and association text features in each type, select these text features as input to the neural networks. And then acquire the high-level features with strong ability of categorical representation through the convolution layer for improving the accuracy of model classification. The experimental results showed that the proposed model performed significantly better on six different public datasets compared with convolutional neural networks (CNN), recurrent neural networks (RNN), recurrent convolutional neural networks (RCNN), fast text classifier (FAST), and hierarchical attention networks (HAN).