

浏览全部资源
扫码关注微信
1.燕山大学信息科学与工程学院,河北秦皇岛 066004
2.河北省虚拟技术与系统集成重点实验室,河北秦皇岛 066004
3.河北建材职业技术学院,河北秦皇岛 066000
4.河北科技大学大数据与社会计算研究中心,河北石家庄 050018
Received:08 June 2023,
Revised:2024-01-17,
Published:25 June 2024
移动端阅览
孟伟伦, 郭景峰, 邢珂萱, 等. 基于字形特征的中文医学命名实体识别方法[J]. 电子学报, 2024, 52(06): 1945-1954.
MENG Wei-lun, GUO Jing-feng, XING Ke-xuan, et al. A Chinese Medical Named Entity Recognition Method Based on Glyph Features[J]. Acta Electronica Sinica, 2024, 52(06): 1945-1954.
孟伟伦, 郭景峰, 邢珂萱, 等. 基于字形特征的中文医学命名实体识别方法[J]. 电子学报, 2024, 52(06): 1945-1954. DOI:10.12263/DZXB.20230516
MENG Wei-lun, GUO Jing-feng, XING Ke-xuan, et al. A Chinese Medical Named Entity Recognition Method Based on Glyph Features[J]. Acta Electronica Sinica, 2024, 52(06): 1945-1954. DOI:10.12263/DZXB.20230516
作为医学信息抽取的第一个关键环节,医学命名实体识别任务旨在从如电子医疗病例、中文医药说明书等非结构化文本中抽取出医学相关的实体.目前大多数中文医学命名实体识别工作通过在预训练模型上进行微调来获得文本表示向量,然后利用特征工程来提升模型在医疗领域上的性能.这些模型大部分源自在通用数据集上表现较好的模型,没有考虑中文医学数据集的语言特性.通过在多个医学数据集上进行统计分析,发现部分类型的医学实体在字形上具有共性,如在汉字中大部分表示疾病含义的字符都包含“疒”,大部分表示身体器官的字符都包含“月”.针对这些问题,本文提出了一种基于字形特征的中文医学命名实体识别方法,该方法通过在文本表示向量上融合字形向量以及进一步利用数据集中负样本来提升模型的准确度和泛化能力.在多个公共的中文医学数据集上的实验结果表明,该方法获得了比其他模型更好的效果,并且通过消融实验证明了融合字形特征和从负样本中学习对于该任务是有效的.
As the first key link in medical information extraction
the medical named entity recognition task aims to extract medical-related entities from unstructured texts such as electronic medical records and Chinese medical instructions. Most current Chinese medical named entity recognition works obtain text representation vectors by fine-tuning pre-trained models
and then use feature engineering to improve the performance of the models in the medical field. Most of these models are derived from models that perform well on general-purpose datasets
without considering the language characteristics of Chinese medical datasets. Through statistical analysis on multiple medical data sets
it is found that some types of medical entities have similarities in glyphs. For example
in Chinese characters
most of the characters representing diseases contain “疒”
and most of the characters representing body organs contain “月”. In response to these problems
this paper proposes a Chinese medical named entity recognition method based on glyph features. This method improves the accuracy and generalization ability of the model by fusing the glyph vector on the text representation vector and further utilizing the negative samples in the dataset. Experimental results on multiple public Chinese medical datasets show that this method achieves better results than other models
and ablation experiments prove that fusing glyph features and learning from negative samples is effective for this task.
李冬梅 , 罗斯斯 , 张小平 , 等 . 命名实体识别方法研究综述 [J ] . 计算机科学与探索 , 2022 , 16 ( 9 ): 1954 - 1968 .
LI D M , LUO S S , ZHANG X P , et al . Review on named entity recognition [J ] . Journal of Frontiers of Computer Science and Technology , 2022 , 16 ( 9 ): 1954 - 1968 . (in Chinese)
杨锦锋 , 于秋滨 , 关毅 , 等 . 电子病历命名实体识别和实体关系抽取研究综述 [J ] . 自动化学报 , 2014 , 40 ( 8 ): 1537 - 1562 .
YANG J F , YU Q B , GUAN Y , et al . An overview of research on electronic medical record oriented named entity recognition and entity relation extraction [J ] . Acta Automatica Sinica , 2014 , 40 ( 8 ): 1537 - 1562 . (in Chinese)
ZHANG Z Y , HAN X , LIU Z Y , et al . ERNIE: Enhanced language representation with informative entities [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2019 : 1441 - 1451 .
胡婕 , 胡燕 , 刘梦赤 , 等 . 基于知识库实体增强BERT模型的中文命名实体识别 [J ] . 计算机应用 , 2022 , 42 ( 9 ): 2680 - 2685 .
HU J , HU Y , LIU M C , et al . Chinese named entity recognition based on knowledge base entity enhanced BERT model [J ] . Journal of Computer Applications , 2022 , 42 ( 9 ): 2680 - 2685 . (in Chinese)
殷章志 , 李欣子 , 黄德根 , 等 . 融合字词模型的中文命名实体识别研究 [J ] . 中文信息学报 , 2019 , 33 ( 11 ): 95 - 100, 106 .
YIN Z Z , LI X Z , HUANG D G , et al . Chinese named entity recognition ensembled with character [J ] . Journal of Chinese Information Processing , 2019 , 33 ( 11 ): 95 - 100, 106 . (in Chinese)
LI Y M , LIU L M , SHI S M . Empirical analysis of unlabeled entity problem in named entity recognition [EB/OL ] . [2020 ] . http://arxiv.org/abs/2012.05426.pdf http://arxiv.org/abs/2012.05426.pdf .
LI Y M , LIU L M , SHI S M . Rethinking negative sampling for handling missing entity annotations [C ] // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) . Stroudsburg : Association for Computational Linguistics , 2022 : 7188 - 7197 .
LING Y , HASAN S A , FARRI O , et al . A domain knowledge-enhanced LSTM-CRF model for disease named entity recognition [J ] . AMIA Joint Summits on Translational Science , 2019 , 2019 : 761 - 770 .
LI Y , DU G D , XIANG Y , et al . Towards Chinese clinical named entity recognition by dynamic embedding using domain-specific knowledge [J ] . Journal of Biomedical Informatics , 2020 , 106 : 103435 .
DONG C H , ZHANG J J , ZONG C Q , et al . Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [M ] // Natural Language Processing and Chinese Computing . Cham : Springer , 2016 : 239 - 250 .
崔少国 , 陈俊桦 , 李晓虹 . 融合语义及边界信息的中文电子病历命名实体识别 [J ] . 电子科技大学学报 , 2022 , 51 ( 4 ): 565 - 571 .
CUI S G , CHEN J H , LI X H . Named entity recognition for Chinese electronic medical record by fusing semantic and boundary information [J ] . Journal of University of Electronic Science and Technology of China , 2022 , 51 ( 4 ): 565 - 571 . (in Chinese)
DEVLIN J , CHANG M-W , LEE K , et al . Bert: Pre-training of deep bidirectional transformers for language understanding [C ] // 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Stroudsburg : ACL , 2019 : 4171 - 4186 .
RADFORD A , NARASIMHAN K , SALIMANS T , et al . Improving language understanding by generative pre-training [EB/OL ] . [2023 ] . https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018-improving.pdf https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018-improving.pdf .
GU Y J , QU X Y , WANG Z F , et al . Delving deep into regularity: A simple but effective method for Chinese named entity recognition [C ] // Findings of the Association for Computational Linguistics: NAACL 2022 . Stroudsburg : Association for Computational Linguistics , 2022 : 1863 - 1873 .
吴炳潮 , 邓成龙 , 关贝 , 等 . 动态迁移实体块信息的跨领域中文实体识别模型 [J ] . 软件学报 , 2022 , 33 ( 10 ): 3776 - 3792 .
WU B C , DENG C L , GUAN B , et al . Dynamically transfer entity span information for cross-domain Chinese named entity recognition [J ] . Journal of Software , 2022 , 33 ( 10 ): 3776 - 3792 . (in Chinese)
PENG M L , XING X Y , ZHANG Q , et al . Distantly supervised named entity recognition using Positive-unlabeled learning [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2019 : 2409 - 2419 .
LI X N , YAN H , QIU X P , et al . FLAT: Chinese NER using flat-lattice transformer [C ] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2020 : 6836 - 6842 .
WU S , SONG X N , FENG Z H . MECT: Multi-metadata embedding based cross-transformer for Chinese named entity recognition [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing . Stroudsburg : Association for Computational Linguistics , 2021 : 1529 - 1539 .
LI X Y , FENG J R , MENG Y X , et al . A unified MRC framework for named entity recognition [C ] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : Association for Computational Linguistics , 2020 : 5849 - 5859 .
YANG P , CONG X , SUN Z Y , et al . Enhanced language representation with label knowledge for span extraction [C ] // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . Stroudsburg : Association for Computational Linguistics , 2021 : 4623 - 4635 .
0
Views
29
下载量
1
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621