电子学报 ›› 2020, Vol. 48 ›› Issue (5): 833-839.DOI: 10.3969/j.issn.0372-2112.2020.05.001

• 学术论文 •    下一篇

面向句法结构的文本检索方法研究

马路遥, 夏博, 肖叶, 荀恩东   

  1. 北京语言大学信息科学学院, 北京 100083
  • 收稿日期:2019-09-10 修回日期:2020-01-06 出版日期:2020-05-25
    • 通讯作者:
    • 荀恩东
    • 作者简介:
    • 马路遥 女,硕士研究生,1994年出生.研究方向为自然语言处理.E-mail:maluyao_blcu@outlook.com;肖叶 女,硕士研究生,1996年出生,研究方向为自然语言处理.E-mail:blcuxiao@126.com;夏博 男,硕士,1993年出生,研究方向为计算机网络,自然语言处理.E-mail:blcuxiabo@126.com
    • 基金资助:
    • 国家社会科学基金重点项目 (No.16AYY007); 北京市语言资源高精尖创新中心科研项目 (No.TYR17001J); 北京语言大学研究生创新基金 (No.19YCX118)

Structural Retrieval on Chinese Syntax Tree Corpus

MA Lu-yao, XIA Bo, XIAO Ye, XUN En-dong   

  1. Beijing Language and Culture University, School of Information Science, Beijing 100083, China
  • Received:2019-09-10 Revised:2020-01-06 Online:2020-05-25 Published:2020-05-25
    • Corresponding author:
    • XUN En-dong
    • Supported by:
    • The National Social Science Fund of China (No.16AYY007); Research Program of Beijing Advanced Innovation Center for language Recourses (No.TYR17001J); Postgraduate Innovation Fund Project of Beijing Language and Culture University (No.19YCX118)

摘要: 语言资源加工和语言学研究,对大规模树库的结构化检索有很高需求.本文针对句法树语料设计了索引、检索方法.针对汉语的特点以及知识抽取任务的需求,我们设计了七种索引结构,旨在借助句法树的结构、属性信息,进行高效、准确的知识抽取.本方法不仅支持字符串检索、属性检索,也支持基于句法树结构、属性信息的检索.实验证明,本方法高效、准确.

关键词: 句法树语料, 知识抽取, 信息检索, 语言资源

Abstract: Language resource processing and linguistics research require effective retrieval on syntax tree corpus. This paper presented an index and search method for syntax tree corpus, which is efficient, accurate, and flexible. Based on the features of Chinese language and the needs for knowledge extraction, we designed seven types of indexes, aiming that with the help of structure and attribute information, knowledge extraction will be performed more effectively and accurately. Apart from general retrieval functions, our method supports retrieval based on the structure and attribute information of syntax trees. Experiments show that our method is both accurate and efficient.

Key words: syntax tree corpus, knowledge extraction, information retrieval, language resource

中图分类号: