

浏览全部资源
扫码关注微信
福州大学物理与信息工程学院,福建福州 350108
Received:16 October 2024,
Revised:2025-02-25,
Published:25 March 2025
移动端阅览
陈平平, 林虎, 陈宏辉, 等. 双域感知下多方显式信息协同的场景端到端文本识别[J]. 电子学报, 2025, 53(03): 974-985.
CHEN Ping-ping, LIN Hu, CHEN Hong-hui, et al. End-to-End Scene Text Spotting Under Dual Domain Awareness Based on Multi-Party Synergetic Explicit Information[J]. Acta Electronica Sinica, 2025, 53(03): 974-985.
陈平平, 林虎, 陈宏辉, 等. 双域感知下多方显式信息协同的场景端到端文本识别[J]. 电子学报, 2025, 53(03): 974-985. DOI:10.12263/DZXB.20240919
CHEN Ping-ping, LIN Hu, CHEN Hong-hui, et al. End-to-End Scene Text Spotting Under Dual Domain Awareness Based on Multi-Party Synergetic Explicit Information[J]. Acta Electronica Sinica, 2025, 53(03): 974-985. DOI:10.12263/DZXB.20240919
在复杂自然场景的端到端文本识别中,由于文本和背景难以区分,文本检测的位置信息和识别的语义信息不匹配,无法有效利用检测和识别之间的相关性.针对该问题,本文提出双域感知下多方显式信息协同的自然场景端到端文本识别方法(Multi-party Synergetic explicit Information with Dual-domain Awaren
ess text spotting,MSIDA),通过强化文本区域特征和边缘纹理,利用文本检测和识别特征之间的协同作用提高端到端文本识别性能.首先,设计融合文本空间和方向信息的双域感知模块(Dual-Domain Awareness,DDA),增强文本实例的视觉特征信息;其次,提出多方显式信息协同模块(Multi-party Explicit Information Synergy,MEIS)提取编码特征中的显式信息,通过匹配对齐用于检测和识别的位置、分类和字符多方信息生成候选文本实例;最后,协同特征通过解码器引导可学习的查询序列获得文本检测和识别的结果.相比最新的DeepSolo(Decoder with explicit points Solo)方法,在Total-Text、ICDAR 2015和CTW1500数据集上,MSIDA模型的准确率分别提升0.8%、0.8%和0.4%.代码和数据集在
https://github.com/msida2024/MSIDA.git
https://github.com/msida2024/MSIDA.git
可以获取.
In the end-to-end text recognition of complex natural scenes
because text and background are difficult to distinguish
the location information detected by text and the semantic information recognized do not match
and the correlation between detection and recognition cannot be effectively utilized. In response to this problem
this paper proposes a multi-party synergetic information with dual-domain awareness text spotting (MSIDA). By enhancing text region features and edge textures
the synergies between text detection and recognition features are utilized to improve end-to-end text recognition performance. Firstly
a dual-domain awareness (DDA) module integrating text space and direction information is designed to enhance the visual feature information of text instances. Secondly
a multi-party explicit information synergy(MEIS) is proposed to extract explicit information from coding features and generate candidate text instances by matching and allocating the position
classification and character multi-party information used for detection and recognition. Finally
cooperative features guide learnable query sequences through decoders to obtain text detection and recognition results. Compared to the latest decoder with explicit points solo (DeepSolo) method
on the Total-Text
ICDAR 2015 and CTW1500 datasets
the accuracy of MSIDA improved respectively by 0.8%
0.8% and 0.4%. The code and datasets are avai
lable at
https://github.com/msida2024/MSIDA.git
https://github.com/msida2024/MSIDA.git
.
ZHANG C S , TAO Y F , DU K , et al . Character-level street view text spotting based on deep multisegmentation network for smarter autonomous driving [J ] . IEEE Transactions on Artificial Intelligence , 2022 , 3 ( 2 ): 297 - 308 .
DESOUZA G N , KAK A C . Vision for mobile robot navigation: A survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2002 , 24 ( 2 ): 237 - 267 .
孟伟伦 , 郭景峰 , 邢珂萱 , 等 . 基于字形特征的中文医学命名实体识别方法 [J ] . 电子学报 , 2024 , 52 ( 6 ): 1945 - 1954 .
MENG W L , GUO J F , XING K X , et al . A Chinese medical named entity recognition method based on glyph features [J ] . Acta Electronica Sinica , 2024 , 52 ( 6 ): 1945 - 1954 . (in Chinese)
黄俊炀 , 陈宏辉 , 王嘉宝 , 等 . 多域字符距离感知的场景文本图像超分辨率重建 [J ] . 电子学报 , 2024 , 52 ( 7 ): 2262 - 2270 .
HUANG J Y , CHEN H H , WANG J B , et al . Scene text image super-resolution reconstruction based on perceiving multi-domain character distance [J ] . Acta Electronica Sinica , 2024 , 52 ( 7 ): 2262 - 2270 . (in Chinese)
LI H , WANG P , SHEN C H . Towards end-to-end text spotting with convolutional recurrent neural networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 5248 - 5256 .
FENG W , HE W H , YIN F , et al . TextDragon: An end-to-end framework for arbitrary shaped text spotting [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 9075 - 9084 .
LIU X B , LIANG D , YAN S , et al . FOTS: Fast oriented text spotting with a unified network [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 5676 - 5685 .
YAO C , BAI X , LIU W Y , et al . Detecting texts of arbitrary orientations in natural images [C ] // 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2012 : 1083 - 1090 .
LIU Y L , CHEN H , SHEN C H , et al . ABCNet: Real-time scene text spotting with adaptive bezier-curve network [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 9809 - 9818 .
HUANG M X , LIU Y L , PENG Z H , et al . SwinTextSpotter: Scene text spotting via better synergy between text detection and text recognition [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4583 - 4593 .
LIAO M H , LYU P Y , HE M H , et al . Mask TextSpotter: An end-to-end trainable neural network for spotting text with arbitrary shapes [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 2 ): 532 - 548 .
LIAO M H , PANG G , HUANG J , et al . Mask TextSpotter V3: Segmentation proposal network for robust scene text spotting [M ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 706 - 722 .
XING L J , TIAN Z , HUANG W L , et al . Convolutional character networks [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [J ] . Neural Information Processing Systems , 2017 , 30 : 1 - 9 .
ZHANG X , SU Y W , TRIPATHI S , et al . Text spotting transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 9509 - 9518 .
YE M Y , ZHANG J , ZHAO S S , et al . DeepSolo: Let transformer decoder with explicit points solo for text spotting [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19348 - 19357 .
YAIR KITTENPLON , INBAL LAVI , SHARON FOGEL , et al . Towards weakly-supervised text spotting using a multi-task transformer [EB/OL ] . ( 2022-02-14 )[ 2025-03-11 ] . https://arxiv.org/abs/2202.05508 https://arxiv.org/abs/2202.05508 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .
JIA D , YUAN Y H , HE H D , et al . DETRs with hybrid matching [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19702 - 19712 .
邹北骥 , 郭建京 , 朱承璋 , 等 . 基于自适应色彩聚类和上下文信息的自然场景文本检测 [J ] . 电子学报 , 2018 , 46 ( 6 ): 1436 - 1444 .
ZOU B J , GUO J J , ZHU C Z , et al . Natural scene text detection based on adaptive color clustering and context information [J ] . Acta Electronica Sinica , 2018 , 46 ( 6 ): 1436 - 1444 . (in Chinese)
LIU Y L , SHEN C H , JIN L W , et al . ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 11 ): 8048 - 8064 .
ZHU X , SU W , LU L , et al . Deformable DETR: Deformable transformers for end-to-end object detection [EB/OL ] . ( 2021-03-18 )[ 2025-03-11 ] . https://arxiv.org/abs/2010.04159 https://arxiv.org/abs/2010.04159 .
POLYNOMIALS B . Introduction to the Mathematics of Computer Graphics [M ] . RhodeIsland : American Mathematical Society , 2016 .
YE M Y , ZHANG J , ZHAO S S , et al . DPText-DETR: Towards better scene text detection with dynamic points in transformer [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2023 , 37 ( 3 ): 3241 - 3249 .
KUHN H W . The Hungarian method for the assignment problem [J ] . Naval Research Logistics Quarterly , 1955 , 2 ( 1/2 ): 83 - 97 .
LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2999 - 3007 .
师硕 , 覃嘉俊 , 于洋 , 等 . 基于改进ConvMixer和动态焦点损失的视听情感识别 [J ] . 电子学报 , 2024 , 52 ( 8 ): 2824 - 2835 .
SHI S , QIN J J , YU Y , et al . Improved ConvMixer and focal loss with dynamic weight for audio-visual emotion recognition [J ] . Acta Electronica Sinica , 2024 , 52 ( 8 ): 2824 - 2835 . (in Chinese)
GRAVES A , FERNÁNDEZ S , GOMEZ F , et al . Connectionist temporal classification [C ] // Proceedings of the 23rd International Conference on Machine Learning . New York : ACM , 2006 : 369 - 376 .
CHENG C K , CHAN C S , LIU C L . Total-Text: Toward orientation robustness in scene text detection [J ] . International Journal on Document Analysis and Recognition (IJDAR) , 2020 , 23 ( 1 ): 31 - 52 .
KARATZAS D , GOMEZ-BIGORDA L , NICOLAOU A , et al . ICDAR 2015 competition on robust reading [C ] // 2015 13th International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2015 : 1156 - 1160 .
LIU Y L , JIN L W , ZHANG S T , et al . Curved scene text detection via transverse and longitudinal sequence connection [J ] . Pattern Recognition , 2019 , 90 : 337 - 345 .
NAYEF N , YIN F , BIZID I , et al . ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT [C ] // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2017 : 1454 - 1459 .
KARATZAS D , SHAFAIT F , UCHIDA S , et al . ICDAR 2013 robust reading competition [C ] // 2013 12th International Conference on Document Analysis and Recognition . Piscataway : IEEE , 2013 : 1484 - 1493 .
LIN T Y , DOLLÁR P , GIRSHICK R , et al . Feature pyramid networks for object detection [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 936 - 944 .
LOSHCHILOV I , HUTTER F . Decoupled weight decay regularization [EB/OL ] . ( 2019-01-04 )[ 2025-3-11 ] . https://arxiv.org/abs/1711.05101 https://arxiv.org/abs/1711.05101 .
BAEK Y , SHIN S , BAEK J , et al . Character region attention for text spotting [M ] // Computer Vision-ECCV 2020 . Cham : Springer International Publishing , 2020 : 504 - 521 .
WANG P F , ZHANG C Q , QI F , et al . PGNet: Real-time arbitrarily-shaped text spotting with point gathering network [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 4 ): 2782 - 2790 .
WANG W H , XIE E Z , LI X , et al . PAN++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022 , 44 ( 9 ): 5349 - 5367 .
PENG D Z , WANG X Y , LIU Y L , et al . SPTS: Single-point text spotting [C ] // Proceedings of the 30th ACM International Conference on Multimedia . New York : ACM , 2022 : 4272 - 4281 .
LI Z C , QU Y D , XIE H T , et al . LATextSpotter: Empowering transformer decoder with length perception ability [C ] // 2024 IEEE International Symposium on Circuits and Systems (ISCAS) . Piscataway : IEEE , 2024 : 1 - 5 .
0
Views
12
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621