Semantic-Enhanced Zero-Shot Oracle Character Recognition

LIU Zong-hao; PENG Wen-jie; DAI Gang; HUANG Shuang-ping; LIU Yong-ge

doi:10.12263/DZXB.20240286

您当前的位置：

首页 >

文章列表页 >

Semantic-Enhanced Zero-Shot Oracle Character Recognition

PAPERS | 更新时间：2026-05-07

- Semantic-Enhanced Zero-Shot Oracle Character Recognition
- ACTA ELECTRONICA SINICA Vol. 52, Issue 10, Pages: 3347-3358(2024)
- 作者机构：
  
  1.华南理工大学电子与信息学院，广东广州 510641
  2.安阳师范学院，河南安阳 455099
- 作者简介：
- 基金信息：
  
  National Key R&D Program of China(2023YFC3502900);National Natural Science Foundation of China(62176093;61673182);Key Realm R&D Program of Guangzhou(202206030001);Guangdong-Hong Kong-Macao Joint Innovation Project(2023A0505030016)
- DOI：10.12263/DZXB.20240286
  CLC： TP183
- Received：02 April 2024，
  
  Revised：2024-07-26，
  
  Published：25 October 2024
- 稿件说明：
移动端阅览
刘宗昊, 彭文杰, 代港, 等. 语义增强的零样本甲骨文字符识别[J]. 电子学报, 2024, 52(10): 3347-3358.

LIU Zong-hao, PENG Wen-jie, DAI Gang, et al. Semantic-Enhanced Zero-Shot Oracle Character Recognition[J]. Acta Electronica Sinica, 2024, 52(10): 3347-3358.
刘宗昊, 彭文杰, 代港, 等. 语义增强的零样本甲骨文字符识别[J]. 电子学报, 2024, 52(10): 3347-3358. DOI：10.12263/DZXB.20240286

LIU Zong-hao, PENG Wen-jie, DAI Gang, et al. Semantic-Enhanced Zero-Shot Oracle Character Recognition[J]. Acta Electronica Sinica, 2024, 52(10): 3347-3358. DOI：10.12263/DZXB.20240286

摘要

甲骨文识别对于了解中国历史和传承中华文化都有重要的价值.目前，人工识别甲骨文需要具备丰富的专家经验并耗费大量的时间，而自动识别甲骨文的方法绝大部分受制于闭集假设，在甲骨文这种陆续发现新字符的现实场景下适用范围受限.为此，有研究者提出零样本甲骨文字符识别，其从视觉匹配的角度出发，将字模图像作为字符类别参考，通过拓片图像与字模图像的相似度匹配实现拓片图像的字符识别，然而其忽略了甲骨文拓片图像样本类内方差大的难点，仍存在因字形多变而容易匹配错误的不足.本文提出了一种两阶段的语义增强零样本甲骨文字符识别方法.第一阶段为域无关的字符语义学习阶段，通过提示学习从甲骨文拓片和字模图像中提取字符语义，解决甲骨文字符缺乏语义的问题.为应对拓片与字模之间的域差异，我们分别设置可学习的域提示信息和字符类别提示信息，通过解耦两者的语义实现更准确的特征提取.第二阶段为语义增强的字符图像视觉匹配阶段，模型通过两个分支分别提取类内共享特征和类间差异特征.第一个分支使用对比学习，将同一字符类别的不同字形视觉特征对齐到字符语义，引导模型关注类内共享特征；第二个分支使用损失函数N-Pair，增强模型对不同字符类别间差异特征的学习.在测试阶段，模型无须语义特征，通过训练中学到的类内相似性和类间差异性特征，实现更准确的拓片与字模匹配，提升零样本识别性能.我们在拓片数据集OBC306和字模数据集SOC5519上进行实验验证，实验结果表明，本文提出的方法在零样本甲骨文识别准确率比基准方法性能提升超过25%.

Abstract

Oracle bone character recognition holds significant value for understanding Chinese history and the inheritance of Chinese culture. Currently

manual recognition of oracle bone character requires extensive expert experience and consumes a great deal of time

while the majority of methods for automatic recognition are constrained by the closed-set assumption. This limitation becomes pronounced in the context of oracle bones

where new characters are continuously discovered. To address this

some researchers achieved zero-shot oracle character recognition by visual matching. This method employs handprinted images as category references

achieving character recognition in scanned images through similarity matching with handprinted references. However

this approach overlooks the challenge of large intra-class variance in oracle bone scanned images

leading to potential mismatches due to the variability in glyphs. This paper proposes a two-stage semantic-enhanced zero-shot oracle character recognition method. The first stage is domain-independent character semantic learning

where the contrastive vision-language pre-training model CLIP is used to extract character semantics from oracle rubbings and template images through prompt learning

addressing the lack of semantic information in oracle characters. To cope with the domain differences between rubbings and templates

we set learnable domain-specific prompts and character category prompts

decoupling their semantics to achieve more accurate feature extraction. The second stage is semantic-enhanced character image visual matching. The model extracts intra-class shared features and inter-class distinctive features through two branches. The first branch uses contrastive learning to align the visual features of different glyphs within the same character category to the character semantics

guiding the model to focus on intra-class shared features. The second branch employs the loss function N-Pair to enhance the model’s ability to learn distinctive features between different character categories. During the testing phase

the model does not require semantic features; instead

it utilizes the intra-class similarity and inter-class distinctiveness learned during training to achieve more accurate matching between rubbings and templates

improving zero-shot recognition performance. Experimental validation on the scanned images dataset OBC306 and the handprinted images dataset SOC5519 demonstrates that our proposed method surpasses the baseline method in zero-shot oracle character recognition accuracy by over 25%.

关键词

Keywords

references

史先进 , 曹爽 , 张重生 , 等 . 基于锚点的字符级甲骨图像自动标注算法研究 [J ] . 电子学报 , 2021 , 49 ( 10 ): 2020 - 2031 .

SHI X J , CAO S , ZHANG C S , et al . Research on automatic annotation algorithm for character-level oracle-bone images based on anchor points [J ] . Acta Electronica Sinica , 2021 , 49 ( 10 ): 2020 - 2031 . (in Chinese)

张重生 , 王斌 . 基于序列相似性计算的甲骨残片缀合算法 [J ] . 电子学报 , 2023 , 51 ( 4 ): 860 - 869 .

ZHANG C S , WANG B . Oracle bone fragments conjugation based on sequence matching [J ] . Acta Electronica Sinica , 2023 , 51 ( 4 ): 860 - 869 . (in Chinese)

祁友杰 , 朱恩 . 一种非闭合曲线的傅里叶描述新算法 [J ] . 东南大学学报(自然科学版) , 2014 , 44 ( 5 ): 886 - 890 .

QI Y J , ZHU E . New Fourier description of non-closed curve [J ] . Journal of Southeast University (Natural Science Edition) , 2014 , 44 ( 5 ): 886 - 890 . (in Chinese)

顾绍通 . 基于拓扑配准的甲骨文字形识别方法 [J ] . 计算机与数字工程 , 2016 , 44 ( 10 ): 2001 - 2006 .

GU S T . Identification of oracle-bone script fonts based on topological registration [J ] . Computer & Digital Engineering , 2016 , 44 ( 10 ): 2001 - 2006 . (in Chinese)

刘永革 , 刘国英 . 基于SVM的甲骨文字识别 [J ] . 安阳师范学院学报 , 2017 ( 2 ): 54 - 56 .

LIU Y G , LIU G Y . Oracle bone inscription recognition based on SVM [J ] . Journal of Anyang Normal University , 2017 ( 2 ): 54 - 56 . (in Chinese)

HUANG S P , WANG H B , LIU Y G , et al . OBC306: A large-scale oracle bone character recognition dataset [C ] // 2019 International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2019 : 681 - 688 .

ZHANG Y K , ZHANG H , LIU Y G , et al . Oracle character recognition by nearest neighbor classification with deep metric learning [C ] // 2019 International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2019 : 309 - 314 .

LI J , WANG Q F , ZHANG R , et al . Mix-up augmentation for oracle character recognition with imbalanced data distribution [C ] // Document Analysis and Recognition — ICDAR 2021 . Cham : Springer International Publishing , 2021 : 237 - 251 .

张颐康 , 张恒 , 刘永革 , 等 . 基于跨模态深度度量学习的甲骨文字识别 [J ] . 自动化学报 , 2021 , 47 ( 4 ): 791 - 800 .

ZHANG Y K , ZHANG H , LIU Y G , et al . Oracle character recognition based on cross-modal deep metric learning [J ] . Acta Automatica Sinica , 2021 , 47 ( 4 ): 791 - 800 . (in Chinese)

CAI S T , QIU L P , CHEN X J , et al . Semantic-enhanced image clustering [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 6 ): 6869 - 6878 .

黄俊炀 , 陈宏辉 , 王嘉宝 , 等 . 多域字符距离感知的场景文本图像超分辨率重建 [J ] . 电子学报 , 2024 , 52 ( 7 ): 2262 - 2270 .

HUANG J Y , CHEN H H , WANG J B , et al . Scene text image super-resolution reconstruction based on perceiving multi-domain character distance [J ] . Acta Electronica Sinica , 2024 , 52 ( 7 ): 2262 - 2270 . (in Chinese)

WANG G R , TANG Y S , LIN L , et al . Semantic-aware auto-encoders for self-supervised representation learning [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 9654 - 9665 .

RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [EB/OL ] . ( 2021-02-26 )[ 2024-04-02 ] . https://arxiv.org/abs/2103.00020v1 https://arxiv.org/abs/2103.00020v1 .

KHOSLA P , TETERWAK P , WANG C , et al . Supervised contrastive learning [J ] . Advances in Neural Information Processing Systems , 2020 , 33 : 18661 - 18673 .

DAI G , ZHANG Y F , WANG Q F , et al . Disentangling writer and character styles for handwriting generation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 5977 - 5986 .

LIN Z H , LI J R , DAI G , et al . Contrastive representation enhancement and learning for handwritten mathematical expression recognition [J ] . Pattern Recognition Letters , 2024 , 186 : 14 - 20

DAI G , ZHANG Y F , KE Q H , et al . One-shot diffusion mimicker for handwritten text generation [EB/OL ] . ( 2024-09-26 )[ 2024-04-02 ] . https://arxiv.org/abs/2409.04004v2 https://arxiv.org/abs/2409.04004v2 .

KHOSLA P , TETERWAK P , WANG C , et al . Supervised contrastive learning [J ] . Advances in neural information processing systems , 2020 , 33 : 18661 - 18673 .

张鲁宁 , 左信 , 刘建伟 . 零样本学习研究进展 [J ] . 自动化学报 , 2020 , 46 ( 1 ): 1 - 23 .

ZHANG L N , ZUO X , LIU J W . Research and development on zero-shot learning [J ] . Acta Automatica Sinica , 2020 , 46 ( 1 ): 1 - 23 . (in Chinese)

LAMPERT C H , NICKISCH H , HARMELING S . Learning to detect unseen object classes by between-class attribute transfer [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 951 - 958 .

LAMPERT C H , NICKISCH H , HARMELING S . Attribute-based classification for zero-shot visual object categorization [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2014 , 36 ( 3 ): 453 - 465 .

JAYARAMAN D , GRAUMAN K . Zero-shot recognition with unreliable attributes [C ] // Proceedings of the 27th International Conference on Neural Information Processing Systems . Cambridge : MIT Press , 2014 : 3464 - 3472 .

FU Y W , HOSPEDALES T M , XIANG T , et al . Learning multimodal latent attributes [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2014 , 36 ( 2 ): 303 - 316 .

HUANG S , ELHOSEINY M , ELGAMMAL A , et al . Learning hypergraph-regularized attribute predictors [C ] // 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2015 : 409 - 417 .

FROME A. , CORRADO G. S. , SHLENS J. , et al . DeViSE: A deep visual-semantic embedding model [C ] // Proceedings of the 26th International Conference on Neural Information Processing Systems . Red Hook : Curran Associates Inc. , 2013 : 2121 - 2129 .

SOCHER R , GANJOO M , MANNING C D , et al . Zero-shot learning through cross-modal transfer [C ] // Proceedings of the 26th International Conference on Neural Information Processing Systems . Red Hook : Curran Associates Inc. , 2013 : 935 - 943 .

SHIGETO Y , SUZUKI I , HARA K , et al . Ridge regression, hubness, and zero-shot learning [C ] // Machine Learning and Knowledge Discovery in Databases . Cham : Springer International Publishing , 2015 : 135 - 151 .

YANG Y X , HOSPEDALES T M . A unified perspective on multi-domain and multi-task learning [EB/OL ] . ( 2014-11-23 )[ 2024-04-02 ] . https://arxiv.org/abs/1412.7489v3 https://arxiv.org/abs/1412.7489v3 .

ZHANG L , XIANG T , GONG S . Learning a deep embedding model for zero-shot learning [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 3010 - 3019 .

KODIROV E , XIANG T , GONG S G . Semantic Autoencoder for zero-shot learning [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 4447 - 4456 .

XIAN Y Q , LORENZ T , SCHIELE B , et al . Feature generating networks for zero-shot learning [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5542 - 5551

SHEN Y M , QIN J , HUANG L , et al . Invertible zero-shot recognition flows [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2020 : 614 - 631 .

WANG W C , ZHANG J S , DU J , et al . Denseran for offline handwritten chinese character recognition [C ] // 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) . Piscataway : IEEE , 2018 : 104 - 109 .

GAN J , CHEN Y , HU B , et al . Characters as graphs: Interpretable handwritten Chinese character recognition via Pyramid Graph Transformer [J ] . Pattern Recognition , 2023 , 137 : 109317 .

WANG T W , XIE Z C , LI Z , et al . Radical aggregation network for few-shot offline handwritten Chinese character recognition [J ] . Pattern Recognition Letters , 2019 , 125 : 821 - 827 .

ZU X Y , YU H Y , LI B , et al . Chinese character recognition with augmented character profile matching [C ] // Proceedings of the 30th ACM International Conference on Multimedia . New York : ACM , 2022 : 6094 - 6102

CHEN J Y , LI B , XUE X Y . Zero-shot chinese character recognition with stroke-level decomposition [C ] //ZHOU Z H. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence . California : International Joint Conferences on Artificial Intelligence Organization , 2021 : 615 - 621 .

AO X , ZHANG X Y , YANG H M , et al . Cross-modal prototype learning for zero-shot handwriting recognition [C ] // 2019 International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2019 : 589 - 594 .

LI Z Y , WU Q , XIAO Y , et al . Deep matching network for handwritten Chinese character recognition [J ] . Pattern Recognition , 2020 , 107 : 107471 .

LIU C , YANG C , QIN H B , et al . Towards open-set text recognition via label-to-prototype learning [J ] . Pattern Recognition , 2023 , 134 : 109109 .

LIU C , YANG C , YIN X C . Open-set text recognition via character-context decoupling [C ] // Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 2022 : 4523 - 4532 .

GUO J , WANG C H , ROMAN-RANGEL E , et al . Building hierarchical representations for oracle character and sketch recognition [J ] . IEEE Transactions on Image Processing , 2016 , 25 ( 1 ): 104 - 118 .

ZHANG H Y , CISSE M , DAUPHIN Y N , et al . Mixup: Beyond empirical risk minimization [EB/OL ] . ( 2017-10-25 )[ 2024-04-02 ] . https://arxiv.org/abs/1710.09412v2 https://arxiv.org/abs/1710.09412v2 .

HAN W H , REN X L , LIN H Y , et al . Self-supervised learning of Orc-BERT augmentor for recognizing few-shot Oracle characters [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2021 : 652 - 668 .

DEVLIN J , CHANG MW , LEE K , et al . BERT: Pre-training of deep bidirectional Transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Minneapolis : Association for Computational Linguistics , 2019 : 4171 - 4186 .

YUE X B , LI H Y , FUJIKAWA Y , et al . Dynamic dataset augmentation for deep learning-based oracle bone inscriptions recognition [J ] . ACM Journal on Computing and Cultural Heritage , 2022 , 15 ( 4 ): 1 - 20 .

LI J , WANG Q F , HUANG K Z , et al . Towards better long-tailed oracle character recognition with adversarial data augmentation [J ] . Pattern Recognition , 2023 , 140 : 109534 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

HUANG H X , YANG D H , DAI G , et al . Agtgan: Unpaired image translation for photographic ancient character generation [C ] // Proceedings of the 30th ACM International Conference on Multimedia . New York : ACM 2022 : 5456 - 5467 .

ZHUANG Z Z , LIU Z H , LAM K M , et al . A new semi-automatic annotation model via semantic boundary estimation for scene text detection [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2021 : 257 - 273 .

LOSHCHILOV I , HUTTER F . SGDR: Stochastic gradient descent with warm restarts [EB/OL ] . ( 2016-08-13 )[ 2024-04-02 ] . https://arxiv.org/abs/1608.03983v5 https://arxiv.org/abs/1608.03983v5 .

WANG M , DENG W H , LIU C L . Unsupervised structure-texture separation network for oracle character recognition [J ] . IEEE Transactions on Image Processing , 2022 , 31 : 3137 - 3150 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Feature Masking and Contrastive Learning Integrating Multi-Dimensional Decorrelation in Sequential Recommendation

Generative Image Detection Based on Diffusion Artifact Contrast Learning

Leveraging Action Description Generation and Cross-Modal Semantic Alignment for Skeleton-Based Action Recognition

Vulnerability Knowledge Graph Construction and Completion with Dual-Modality Perception

Related Author

QIAN Zhongsheng

LIU Jinping

LI Yulong

FAN Fuyu

CHEN Chao

YUAN Chengsheng

CHEN Jinrui

CAO Yi

Related Institution

School of Computing and Artificial Intelligence, Jiangxi University of Finance and Economics

Institute of Artificial Intelligence, Guangzhou University

School of Cybersecurity and Informationization, Wuxi University

Engineering Research Center of Digital Forensics Ministry of Education, Nanjing University of;Information Science and Technology

School of Computer Science, School of Cyber Science and Engineering, Nanjing University of Information Science and Technology

⁰