

浏览全部资源
扫码关注微信
1.西安电子科技大学网络与信息安全学院,陕西西安,710126
2.中央财经大学信息学院,北京,100081
Received:17 November 2024,
Revised:2025-03-17,
Published:25 April 2025
移动端阅览
钱肖, 蒋忠元, 陶梅悦, 等. 基于上下文感知的智能合约要素提取方法[J]. 电子学报, 2025, 53(04): 1322-1336.
QIAN Xiao, JIANG Zhong-yuan, TAO Mei-yue, et al. A Context-Aware Approach for Smart Contract Element Extraction[J]. Acta Electronica Sinica, 2025, 53(04): 1322-1336.
钱肖, 蒋忠元, 陶梅悦, 等. 基于上下文感知的智能合约要素提取方法[J]. 电子学报, 2025, 53(04): 1322-1336. DOI:10.12263/DZXB.20241038
QIAN Xiao, JIANG Zhong-yuan, TAO Mei-yue, et al. A Context-Aware Approach for Smart Contract Element Extraction[J]. Acta Electronica Sinica, 2025, 53(04): 1322-1336. DOI:10.12263/DZXB.20241038
针对各行各业海量文本文档的智能合约化需求,提取文本关键数据要素是首要基础.与传统命名实体识别(Named Entity Recognition,NER)相比,合约要素提取(Contract Element Extraction,CEE)技术旨在提取泛在较长、更多样、较冗余合约要素,然而目前面临着中文研究不足、对新颖大语言模型(Large Language Model,LLM)技术应用不够充分、对文本上下文关联特征感知不足等挑战.本文首先提出了新颖的上下文语义感知动态填充方法(Context-sensitive Dynamic Padding Method,CDPM)、三重注意力层和要素边缘加权损失函数模块,在不增加硬件需求的前提下,为模型提供额外上下文语义信息,增强对上下文关联特征的感知能力,从而提升基于序列标注范式的CEE训练效率.其次,融合上述模块和BERT(Bidirectional Enc
oder Representations from Transformers)嵌入模型构建了一种基于上下文感知的合约要素提取模型(Context-Aware Model for Contract Element Extraction,CAM-CEE),实现了面向智能合约化场景的高性能要素提取.最后,在本文自主构建的数据集以及相关公开数据集上进行了大量实验.结果表明,本文提出框架CAM-CEE在micro
F
1
、macro
F
1
等指标上的性能超越最佳基线模型,并具有高通用性.
Extracting key data elements from text is the primary foundation for the intelligent contract conversion demand of massive text documents in various industries. Compared with traditional named entity recognition (NER)
contract element extraction (CEE) aims to extract ubiquitous
lengthy
diverse
and redundant contract elements. However
it faces challenges such as limited research in Chinese
lack of application of novel large language model (LLM) techniques
and insufficient perception of contextual features in text. This article first proposes a novel context-sensitive dynamic padding method (CDPM)
a triple attention layer
and an edge-weighted loss function. They provide additional context semantics without increasing hardware requirements
enhance the perception of context related features
and improve the efficiency of element extraction training under the sequential annotation paradigm; Secondly
a context-aware deep learning framework context-aware model for contract element extraction (CAM-CEE) was proposed by integrating the above modules with the bidirectional encoder representations from transformers (BERT) embedding model
achieving high-performance element extraction for smart contract scenarios; Finally
extensive experiments are conducted on the independently constructed and publicly available datasets in this article. The results indicate that the proposed framework CAM-CEE outperforms the best baseline model in metrics such as micro
F
1
and macro
F
1
and has high generality.
FANG P C , ZOU Z H , XIAO X S , et al . iSyn: Semi-automated smart contract synthesis from legal financial agreements [C ] // Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis . New York : ACM , 2023 : 727 - 739 .
CHALKIDIS I , ANDROUTSOPOULOS I , MICHOS A . Extracting contract elements [C ] // Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law . New York : ACM , 2017 : 19 - 28 .
RADFORD A , NARASIMHAN K . Improving language understanding by generative pre-training [C ] // Proceedings of the 2018 Conference on Neural Information Processing Systems . Montreal : Curran Associates , 2018 .
BI D-A X , CHEN D , CHEN G , et al . DeepSeek LLM: Scaling open-source language models with longtermism [J/OL ] . ( 2024-01-05 )[ 2025-03-17 ] . https://api.semanticscholar.org/CorpusID:266818336 https://api.semanticscholar.org/CorpusID:266818336 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : Curran Associates Inc , 2017 : 6000 - 6010 .
LEE J , YI J S , SON J . Development of automatic-extraction model of poisonous clauses in international construction contracts using rule-based NLP [J ] . Journal of Computing in Civil Engineering , 2019 , 33 ( 3 ): 04019003 .
KIM Y , LEE J , LEE E B , et al . Application of natural language processing (NLP) and text-mining of big-data to engineering-procurement-construction (EPC) bid and contract documents [C ] // 2020 6th Conference on Data Science and Machine Learning Applications (CDMA) . Piscataway : IEEE , 2020 : 123 - 128 .
PADHY J , JAGANNATHAN M , KUMAR DELHI V S . Application of natural language processing to automatically identify exculpatory clauses in construction contracts [J ] . Journal of Legal Affairs and Dispute Resolution in Engineering and Construction , 2021 , 13 ( 4 ): 04521035 .
GAO X , SINGH M P . Extracting normative relationships from business contracts [C ] // International conference on Autonomous Agents and Multi-Agent Systems . Paris : International Foundation for Autonomous Agents and Multiagent Systems , 2014 : 101 - 108 .
JAFARI P , AL HATTAB M , MOHAMED E , et al . Automated extraction and time-cost prediction of contractual reporting requirements in construction using natural language processing and simulation [J ] . Applied Sciences , 2021 , 11 ( 13 ): 6188 .
ILIAS C , ION A . A deep learning approach to contract element extraction [M ] // Legal Knowledge and Information Systems . Awsterdam : IOS Press , 2017 : 155 - 164 .
ZHANG Q Q , XUE C , SU X , et al . Named entity recognition for Chinese construction documents based on conditional random field [J ] . Frontiers of Engineering Management , 2023 , 10 ( 2 ): 237 - 249 .
ZHANG K , SUN L , JI F L . A TOI based CNN with location regression for insurance contract analysis [C ] // 2019 International Joint Conference on Neural Networks (IJCNN) . Piscataway : IEEE , 2019 : 1 - 8 .
WANG Z H , SONG H Y , REN Z C , et al . Cross-domain contract element extraction with a bi-directional feedback clause-element relation network [C ] // Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . New York : ACM , 2021 : 1003 - 1012 .
AEJAS B , BOURAS A , BELHI A , et al . Smart contracts implementation based on bidirectional encoder representations from transformers [M ] // Product Lifecycle Management, Green and Blue Technologies to Support Smart and Sustainable Organizations . Cham : Springer International Publishing , 2022 : 293 - 304 .
LEIVADITI S , ROSSI J , KANOULAS E . A benchmark for lease contract review [EB/OL ] . ( 2020-10-20 )[ 2025-03-17 ] . https://arxiv.org/abs/2010.10386v1 https://arxiv.org/abs/2010.10386v1 .
GARCÍA-BARRAGÁN Á , CALATAYUD A G , PRIETO-SANTAMARÍA L , et al . Step-forward structuring disease phenotypic entities with LLMs for disease understanding [C ] // 2024 IEEE 37th International Symposium on Computer-Based Medical Systems (CBMS) . Piscataway : IEEE , 2024 : 213 - 218 .
XU D R , CHEN W , PENG W J , et al . Large language models for generative information extraction: A survey [J ] . Frontiers of Computer Science , 2024 , 18 ( 6 ): 186357 .
GOPALAKRISHNAN S , GARBAYO L , ZADROZNY W . Causality extraction from medical text using large language models (LLMs) [J ] . Information , 2025 , 16 ( 1 ); 13 .
SAIER T , OHTA M , ASAKURA T , et al . HyperPIE: Hyperparameter information extraction from scientific publications [M ] // Advances in Information Retrieval . Cham : Springer Nature Switzerland , 2024 : 254 - 269 .
刘小明 . 任务协作表示增强的要素及关系联合抽取模型 [J ] . 电子学报 , 2024 , 52 ( 6 ): 1955 - 1962 .
LIU X M . Task collaboration representation enhanced model for element and relation joint extraction [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 1955 - 1962 . (in Chinese)
DAGDELEN J , DUNN A , LEE S , et al . Structured information extraction from scientific text with large language models [J ] . Nature Communications , 2024 , 15, 1418 . .
LI J Y , FEI H , LIU J , et al . Unified named entity recognition as word-word relation classification [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 10 ): 10965 - 10973 .
孟伟伦 . 基于字形特征的中文医学命名实体识别方法 [J ] . 电子学报 , 2024 , 52 ( 6 ): 1945 - 1954 .
MENG W L . Chinese medical named entity recognition method based on glyph features [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 1945 - 1954 . (in Chinese)
WU S , SONG X N , FENG Z H . MECT: Multi-metadata embedding based cross-transformer for Chinese named entity recognition [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Stroudsburg : PA , 2021 : 1529 - 1539 .
AEJAS B , BELHI A , BOURAS A . Toward an NLP approach for transforming paper contracts into smart contracts [M ] // Intelligent Sustainable Systems . Singapore : Springer Nature Singapore , 2023 : 751 - 759 .
GLOROT X , BENGIO Y . Understanding the difficulty of training deep feedforward neural networks [J ] . Journal of Machine Learning Research , 2010 , 9 : 249 - 256 .
LUONG T , PHAM H , MANNING C D . Effective approaches to attention-based neural machine translation [C ] // Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Lisbon : Association for Computational Linguistics , 2015 : 1412 - 1421 .
ALTURAYEIF N , LUQMAN H . Fine-grained sentiment analysis of Arabic COVID-19 tweets using BERT-based transformers and dynamically weighted loss function [J ] . Applied Sciences , 2021 , 11 ( 22 ): 10694 .
LENG Z , TAN M , LIU C , et al . PolyLoss: A polynomial expansion perspective of classification loss functions [C ] // Proceedings of the 10th International Conference on Learning Representations . Virtual Event : OpenReview.net , 2022 : 25 - 29 .
SUI D B , CHEN Y B , LIU K , et al . Leverage lexical knowledge for Chinese named entity recognition via collaborative graph network [C ] // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Stroudsburg : Association for Computational Linguistics , 2019 : 3828 - 3838 .
WU S , SONG X N , FENG Z H , et al . NFLAT: Non-flat-lattice transformer for Chinese named entity recognition [EB/OL ] . ( 2020-05-12 )[ 2025-03-17 ] . http://dx.doi.org/10.48550/ARXIV.2205.05832 http://dx.doi.org/10.48550/ARXIV.2205.05832 .
WU W J , ZHANG C Y , NIU S Z , et al . Unify the usage of lexicon in Chinese named entity recognition [M ] // Database Systems for Advanced Applications . Cham : Springer Nature Switzerland , 2023 : 665 - 681 .
LIU Y H , OTT M , GOYAL N , et al . RoBERTa: A robustly optimized BERT pretraining approach [EB/OL ] . ( 2019-07-26 )[ 2025-03-17 ] . https://arxiv.org/abs/1907.11692v1 https://arxiv.org/abs/1907.11692v1 .
LAN Z Z , CHEN M D , GOODMAN S , et al . ALBERT: A lite BERT for self-supervised learning of language representations [EB/OL ] . ( 2020-02-09 )[ 2025-03-17 ] . https://arxiv.org/abs/1909.11942v6 https://arxiv.org/abs/1909.11942v6 .
SANH V , DEBUT L , CHAUMOND J , et al . DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter [EB/OL ] . ( 2019-10-02 )[ 2025-03-17 ] . http://arxiv.org/abs/1910.01108 http://arxiv.org/abs/1910.01108 .
CONNEAU A , KHANDELWAL K , GOYAL N , et al . Unsupervised cross-lingual representation learning at scale [C ] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2020 : 8440 - 8551 .
ZHAO Z , CHEN H , ZHANG J B , et al . UER: An open-source toolkit for pre-training models [C ] // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations . Stroudsburg : Association for Computational Linguistics , 2019 : 241 - 246 .
XU L , ZHANG X W , DONG Q Q . CLUECorpus2020: A large-scale Chinese corpus for pre-training language model [EB/OL ] . ( 2020-05-05 )[ 2025-03-17 ] . https://arxiv.org/abs/2003.01355v2 https://arxiv.org/abs/2003.01355v2 .
0
Views
10
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621