1.北京大学软件与微电子学院,北京 102600
2.北京大学软件工程国家工程研究中心,北京 100871
[ "温立强 男,1991年9月出生于山西省孝义市.现为北京大学软件与微电子学院前沿工程博士生,主要研究领域为软件工程、知识图谱问答. E-mail: wenlq@pku.edu.cn" ]
[ "熊冠铭 男,1996年8月出生于福建省三明市.毕业于北京大学软件与微电子学院.主要研究知识图谱问答. E-mail: gm_xiong@qq.com" ]
[ "王 宇 男,1978年出生,辽宁沈阳人.北京大学软件与微电子学院博士研究生,主要研究方向为知识图谱构建和自然语言处理. E-mail: wangyu_cn@stu.pku.edu.cn" ]
[ "陈一朴 男,1992年出生,河南禹州人.现为北京北大软件工程股份有限公司数据智能研究院算法工程师,主要研究领域为推荐系统、自然语言处理. E-mail: eap@buaa.edu.cn" ]
[ "李伟平 男,1973年3月,辽宁凌源人,北京大学教授,主要研究方向为大数据分析,信息抽取. E-mail: wpli@ss.pku.edu.cn" ]
[ "赵 文 男,1967年出生,辽宁大连人.现为北京大学软件工程国家工程研究中心研究员、博士生导师,主要研究领域为软件工程、软件安全. E-mail: zhaowen@pku. edu. cn" ]
收稿:2022-10-20,
修回:2023-02-10,
纸质出版:2024-10-25
移动端阅览
温立强, 熊冠铭, 王宇, 等. 一种基于子图转述的问题生成方法[J]. 电子学报, 2024, 52(10): 3578-3588.
WEN Li-qiang, XIONG Guan-ming, WANG Yu, et al. A Question Generation Method Based on Subgraph Paraphrase[J]. Acta Electronica Sinica, 2024, 52(10): 3578-3588.
温立强, 熊冠铭, 王宇, 等. 一种基于子图转述的问题生成方法[J]. 电子学报, 2024, 52(10): 3578-3588. DOI:10.12263/DZXB.20221188
WEN Li-qiang, XIONG Guan-ming, WANG Yu, et al. A Question Generation Method Based on Subgraph Paraphrase[J]. Acta Electronica Sinica, 2024, 52(10): 3578-3588. DOI:10.12263/DZXB.20221188
本文提出了一种子图转述的方法用于解决知识图谱问题生成中的未见谓词问题. 传统的问题生成方法主要利用标注的问答数据(问题-逻辑形式对)生成问题,然而标注数据难以覆盖知识图谱中所有的谓词,如何对未见的谓词生成问题依然是一个挑战. 本文提出了一种基于子图结构的语义解耦方法,通过将复杂问题对应的知识图谱子图分解为原子级子图,从而将包含未见谓词的多跳子图拆分为易于处理的单跳子图. 并且本文设计了一种子图转述方法,通过对数据集中的谓词进行采样,得到子图描述文本,并在大规模无监督数据上训练得到子图转述器,能够为包含未见谓词的子图提供自然语言形式的表述,为生成问题提供了有效的信息. 本文定量分析了在不同的难度级别下模型的性能表现,在GrailQA等数据集上的实验结果表明,本文的方法达到了最先进的性能.
This paper proposes a method based on subgraph rephrasing to solve the problem of unseen predicates in question generation over knowledge graph. Traditional KBQG (Question Generation over Knowledge Base) methods mainly use annotated Q&A (Question and Answer) data (question and logic formal pairs) to generate questions. However
annotated data can’t fully cover all predicates in the knowledge graph. It is still a challenge to generate questions with unseen predicates in the knowledge graph. In this paper
we propose a semantic decoupling method based on subgraph structure. By decomposing the subgraph corresponding to a complex question into atomic subgraphs
the multi-hop subgraph containing unseen predicates can be divided into single-hop subgraphs that are easy to handle. In addition
we design a subgraph rephrasing procedure to train a subgraph rewriter on large-scale unsupervised data through sampling the predicates in the dataset by subgraph sampling. The subgraph rewriter will provide natural language form for subgraphs and effective information for generating questions. This paper quantitatively analyzes the performance of the model at different difficulty levels. The experimental results on GrailQA and other datasets show that our method achieves the state-of-the-art performance.
BOLLACKER K , EVANS C , PARITOSH P , et al . Freebase: A collaboratively created graph database for structuring human knowledge [C ] // Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data . New York : ACM , 2008 : 1247 - 1250 .
BAO J W , TANG D Y , DUAN N , et al . Text generation from tables [J ] . IEEE/ACM Transactions on Audio , Speech and Language Processing, 2019 , 27 ( 2 ): 311 - 320 .
YIH W T , RICHARDSON M , MEEK C , et al . The value of semantic parse labeling for knowledge base question answering [C ] // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) . Stroudsburg : Association for Computational Linguistics , 2016 : 201 - 206 .
LIU C , LIU K , HE S Z , et al . Generating questions for knowledge bases via incorporating diversified contexts and answer-aware loss [C ] // Proceedings of the 2019 Conference on EMNLP and the 9th IJCNLP . Stroudsburg : Association for Computational Linguistics , 2019 : 2431 - 2441 .
BI S , CHENG X Y , LI Y F , et al . Knowledge-enriched, type-constrained and grammar-guided question generation over knowledge bases [C ] // Proceedings of the 28th International Conference on Computational Linguistics . Stroudsburg, International Committee on Computational Linguistics , 2020 : 2776 - 2786 .
KUMAR V , HUA Y C , RAMAKRISHNAN G , et al . Difficulty-controllable multi-hop question generation from knowledge graphs [C ] // The Semantic Web-ISWC 2019 . New York : ACM , 2019 : 382 - 398 .
KE P , JI H Z , RAN Y , et al . JointGT: Graph-text joint representation learning for text generation from knowledge graphs [C ] // Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 . Stroudsburg, Association for Computational Linguistics , 2021 : 2526 - 2538 .
CHEN Y , WU L F , ZAKI M J . Toward subgraph guided knowledge graph question generation with graph neural networks [EB/OL ] . ( 2020-03-13 )[ 2022-05-20 ] . http://arxiv.org/abs/2004.06015 http://arxiv.org/abs/2004.06015 .
GU Y , KASE S E , VANNI M , et al. Beyond I. I .D.: Three levels of generalization for question answering on knowledge bases[C ] // Proceedings of the Web Conference 2021 . New York : ACM , 2021 : 3477 - 3488 .
LAN Y S , HE G L , JIANG J H , et al . A survey on complex knowledge base question answering: Methods, challenges and solutions [EB/OL ] . ( 2021-05-25 )[ 2022-05-20 ] . http://arxiv.org/abs/2105.11644 http://arxiv.org/abs/2105.11644 .
肖仰华 , 徐波 , 林欣 , 等 . 知识图谱: 概念与技术 [M ] . 北京 : 电子工业出版社 , 2020 .
XIAO Y H , XU B , LIN X , et al . Knowledge Graph [M ] . Beijing : Publishing House of Electronics Industry , 2020 . (in Chinese)
SONG L F , ZHAO L . Question generation from a knowledge base with Web exploration [EB/OL ] . ( 2016-10-12 )[ 2022-05-20 ] . http://arxiv.org/abs/1610.03807 http://arxiv.org/abs/1610.03807 .
SEYLER D , YAHYA M , BERBERICH K . Generating quiz questions from knowledge graphs [C ] // Proceedings of the 24th International Conference on World Wide Web . New York : ACM , 2015 : 113 - 114 .
SEYLER D , YAHYA M , BERBERICH K . Knowledge questions from knowledge graphs [C ] // Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval . New York : ACM , 2017 : 11 - 18 .
BAO J W , TANG D Y , DUAN N , et al . Table-to-text: Describing table region with natural language [EB/OL ] . ( 2018-05-29 )[ 2022-05-20 ] . http://arxiv.org/abs/1805.11234 http://arxiv.org/abs/1805.11234 .
ELSAHAR H , GRAVIER C , LAFOREST F . Zero-shot question generation from knowledge graphs for unseen predicates and entity types [C ] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) . Stroudsburg : Association for Computational Linguistics , 2018 : 218 - 228 .
高留杰 , 赵文 , 张君福 , 等 . G2S: 基于语义块的知识图谱问答语义解析 [J ] . 电子学报 , 2021 , 49 ( 6 ): 1132 - 1141 .
GAO L J , ZHAO W , ZHANG J F , et al . G2S: Semantic segment based semantic parsing for question answering over knowledge graph [J ] . Acta Electronica Sinica , 2021 , 49 ( 6 ): 1132 - 1141 . (in Chinese)
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .
LEWIS M , LIU Y H , GOYAL N , et al . BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension [C ] // Proceedings of the 58th Annual Meeting of the ACL . Stroudsburg : Association for Computational Linguistics , 2020 : 7871 - 7880 .
TALMOR A , BERANT J . The web as a knowledge-base for answering complex questions [C ] //Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) . Stroudsburg : Association for Computational Linguistics , 2018 : 641 - 651 .
WOLF T , DEBUT L , SANH V , et al . Transformers: state-of-the-art natural language processing [C ] // Proceedings of the 2020 Conference on EMNLP: System Demonstrations . Stroudsburg : Association for Computational Linguistics , 2020 : 38 - 45 .
Kingma D , Ba J . Adam: A method for stochastic optimization [EB/OL ] . ( 2014-12-22 )[ 2022-05-20 ] . http://arxiv.org/abs/1412.6980 http://arxiv.org/abs/1412.6980 .
GU J T , LU Z D , LI H , et al . Incorporating copying mechanism in sequence-to-sequence learning [C ] // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : Association for Computational Linguistics , 2016 : 1631 - 1640 .
吴云芳 , 张仰森 . 问题生成研究综述 [J ] . 中文信息学报 , 2021 , 35 ( 7 ): 1 - 9 .
WU Y F , ZHANG Y S . A survey of question generation [J ] . Journal of Chinese Information Processing , 2021 , 35 ( 7 ): 1 - 9 . (in Chinese)
PAPINENI K , ROUKOS S , WARD T , et al . BLEU: A method for automatic evaluation of machine translation [C ] // Proceedings of the 40th Annual Meeting on Association for Computational Linguistics . New York : ACM , 2002 : 311 - 318 .
DENKOWSKI M , LAVIE A . Meteor universal: Language specific translation evaluation for any target language [C ] // Proceedings of the Ninth Workshop on Statistical Machine Translation . Stroudsburg : Association for Computational Linguistics , 2014 : 376 - 380 .
LIN C Y . Rouge: A package for automatic evaluation of summaries [C ] // Proceedings of the Workshop on Text Summarization Branches Out . Barcelona : Association for Computational Linguistics , 2004 : 74 - 81 .
仇韫琦 , 王元卓 , 白龙 , 等 . 面向知识库问答的问句语义解析研究综述 [J ] . 电子学报 , 2022 , 50 ( 9 ): 2242 - 2264 .
QIU Y Q , WANG Y Z , BAI L , et al . A survey of question semantic parsing for knowledge base question answering [J ] . Acta Electronica Sinica , 2022 , 50 ( 9 ): 2242 - 2264 . (in Chinese)
0
浏览量
1
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621