Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance

HUANG Jun-yang; CHEN Hong-hui; WANG Jia-bao; CHEN Ping-ping; LIN Zhi-jian

doi:10.12263/DZXB.20240090

您当前的位置：

首页 >

文章列表页 >

Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance

PAPERS | 更新时间：2025-12-24

- Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance
- ACTA ELECTRONICA SINICA Vol. 52, Issue 7, Pages: 2262-2270(2024)
- 作者机构：
  
  福州大学物理与信息工程学院，福建福州 350108
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(62171135);Distinguished Young Scholars Program of Fujian Province, China(2022J06010);Fujian Provincial Department of Education Key Research Project(2023XQ004);Fuzhou Science and Technology Planning(2023-P-001)
- DOI：10.12263/DZXB.20240090
  CLC： TN911.73;TP391.43
- Received：22 January 2024，
  
  Revised：2024-05-14，
  
  Published：25 July 2024
- 稿件说明：
移动端阅览
黄俊炀, 陈宏辉, 王嘉宝, 等. 多域字符距离感知的场景文本图像超分辨率重建[J]. 电子学报, 2024, 52(07): 2262-2270.

HUANG Jun-yang, CHEN Hong-hui, WANG Jia-bao, et al. Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance[J]. Acta Electronica Sinica, 2024, 52(07): 2262-2270.
黄俊炀, 陈宏辉, 王嘉宝, 等. 多域字符距离感知的场景文本图像超分辨率重建[J]. 电子学报, 2024, 52(07): 2262-2270. DOI：10.12263/DZXB.20240090

HUANG Jun-yang, CHEN Hong-hui, WANG Jia-bao, et al. Scene Text Image Super-Resolution Reconstruction Based on Perceiving Multi-Domain Character Distance[J]. Acta Electronica Sinica, 2024, 52(07): 2262-2270. DOI：10.12263/DZXB.20240090

摘要

场景文本图像超分辨率（Scene Text Image Super-Resolution， STISR）旨在提高文本在低分辨率图像中的分辨率和可读性.但是在空间变形或低分辨率的文本图像中，由于缺乏文本区域细节，语义线索和视觉特征信息难以与字符位置匹配对齐，文本识别效果不佳.针对该问题，本文提出多域字符距离感知的场景文本图像超高分辨率重建方法（Perceiving Multi-Domain Character distance super-resolution， PMDC），强化视觉语义特征，提高文本区域和纹理信息.首先，采用非对称卷积以及语义先验信息模块，提取文本图像的视觉和语义特征信息；其次，融合字符距离感知模块中的视觉和语义特征，得到增强位置编码感知字符间的间距变化和语义相似性；最后，结合引导线索和视觉特征对像素进行重组得到超分辨率文本图像.在公开数据集TextZoom上的实验结果，与最近TATT文本超分网络性能相比，在峰值信噪比指标上提高0.11 dB，有效提高文本清晰度和边缘纹理细节，同时提升1.5%的平均识别准确率，改进文本图像的可读性.

Abstract

Scene text image super-resolution (STISR) aims to enhance the resolution and legibility of text in low-resolution images. In cases of spatial deformation or low-resolution text images

the lack of details in text regions and the difficulty in aligning semantic cues and visual features with character position make it difficult to recognize text effectively. In order to address these challenges

this paper proposes a perceiving multi-domain character distance for scene text image super-resolution method (PMDC)

which improves the image text region and edge texture details. Firsly

the visual and semantic features are extracted by using the asymmetric convolution module along with the semantic prior module. Then the enhanced position coding is obtained by the character distance perception module to perceive the distance change and semantic similarity between characters. Finally

the guiding cues and visual features are combined to restructure the pixels and generate a super-resolution text image. In comparison to TATT

experimental results from the public dataset TextZoom showed an increase of 0.11 dB in the fidelity of the peak signal-to-noise ratio index. This improvement effectively enhances the clarity of the text area and the detailed edge texture. Additionally

the recognition accuracy was improved by 1.4%

which effectively enhances the readability of the text image.

关键词

Keywords

references

ZHANG C S , DING W P , PENG G W , et al . Street view text recognition with deep learning for urban scene understanding in intelligent transportation systems [J ] . IEEE Transactions on Intelligent Transportation Systems , 2021 , 22 ( 7 ): 4727 - 4743 .

SINGH A , NATARAJAN V , SHAH M , et al . Towards VQA models that can read [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 8317 - 8326 .

JADERBERG M , SIMONYAN K , VEDALDI A , et al . Reading text in the wild with convolutional neural networks [J ] . International Journal of Computer Vision , 2016 , 116 ( 1 ): 1 - 20 .

CHENG Z Z , BAI F , XU Y L , et al . Focusing attention: Towards accurate text recognition in natural images [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 5076 - 5084 .

SHI B G , BAI X , YAO C . An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 11 ): 2298 - 2304 .

GRAVES A , FERNÁNDEZ S , GOMEZ F , et al . Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks [C ] // Proceedings of the 23rd International Conference on Machine Learning - ICML'06 . New York : ACM , 2006 : 369 - 376 .

LUO C J , JIN L W , SUN Z H . MORAN: A multi-object rectified attention network for scene text recognition [J ] . Pattern Recognition , 2019 , 90 ( C ): 109 - 118 .

SHI B G , YANG M K , WANG X G , et al . ASTER: An attentional scene text recognizer with flexible rectification‍ [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2019 , 41 ( 9 ): 2035 - 2048 .

QIAO Z , ZHOU Y , YANG D B , et al . SEED: Semantics enhanced encoder-decoder framework for scene text recognition [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 13528 - 13537 .

FANG S C , XIE H T , WANG Y X , et al . Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 7098 - 7107 .

WANG W J , XIE E Z , LIU X B , et al . Scene text image super-resolution in the wild [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 650 - 666 .

MANCAS-THILLOU C , MIRMEHDI M . An introduction to super-resolution text [M ] // Digital Document Processing . London : Springer London , 2007 : 305 - 327 .

刘杰 , 葛一凡 , 田明 . 文物图像的超分辨率重建算法研究 [J ] . 电子学报 , 2023 , 51 ( 1 ): 139 - 145 .

LIU J , GE Y F , TIAN M . Research on super-resolution reconstruction algorithm of cultural relic images [J ] . Acta Electronica Sinica , 2023 , 51 ( 1 ): 139 - 145 . (in Chinese)

XU X Y , SUN D Q , PAN J S , et al . Learning to super-resolve blurry face and text images [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 251 - 260 .

MA J Q , GUO S , ZHANG L . Text prior guided scene text image super-resolution [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 1341 - 1352 .

CHEN J Y , LI B , XUE X Y . Scene text telescope: Text-focused scene image super-resolution [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 12026 - 12035 .

李滔 , 董秀成 , 林宏伟 . 基于深监督跨尺度注意力网络的深度图像超分辨率重建 [J ] . 电子学报 , 2023 , 51 ( 1 ): 128 - 138 .

LI T , DONG X C , LIN H W . Depth map super-resolution reconstruction based on deeply supervised cross-scale attention network [J ] . Acta Electronica Sinica , 2023 , 51 ( 1 ): 128 - 138 . (in Chinese)

WANG T W , ZHU Y Z , JIN L W , et al . Decoupled attention network for text recognition [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12216 - 12224 .

YUE X Y , KUANG Z H , LIN C H , et al . RobustScanner: Dynamically enhancing positional clues for robust text recognition [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 135 - 151 .

WAN Z Y , HE M H , CHEN H R , et al . TextScanner: Reading characters in order for robust scene text recognition [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12120 - 12127 .

LIAO M H , ZHANG J , WAN Z Y , et al . Scene text recognition from two-dimensional perspective [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 8714 - 8721 .

LIU W , CHEN C F , WONG K Y , et al . STAR-net: A spatial attention residue network for scene text recognition‍ [C ] // Proceedings of the British Machine Vision Conference 2016 . Glasgow : British Machine Vision Association , 2016 : 22482128 .

MA J Q , LIANG Z T , ZHANG L . A text attention network for spatial deformation robust scene text image super-resolution [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 5911 - 5920 .

DING X H , GUO Y C , DING G G , et al . ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 1911 - 1920 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6000 - 6010 .

SHI W Z , CABALLERO J , HUSZAR F , et al . Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 1874 - 1883 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL ] . ( 2014-12-22 )[ 2024-05-14 ] . https://arxiv.org/abs/1412.6980 https://arxiv.org/abs/1412.6980 .

ZHAO M Y , WANG M S , BAI F , et al . C3-STISR: Scene text image super-resolution with triple clues [EB/OL ] . ( 2022-04-29 )[ 2024-05-14 ] . https://arxiv.org/abs/2204.14044 https://arxiv.org/abs/2204.14044 .

DONG C , LOY C C , HE K M , et al . Image super-resolution using deep convolutional networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2016 , 38 ( 2 ): 295 - 307 .

NIU B , WEN W L , REN W Q , et al . Single image super-resolution via a holistic attention network [C ] // Computer Vision — ECCV 2020 . Cham : Springer International Publishing , 2020 : 191 - 207 .

QUAN Y H , YANG J T , CHEN Y X , et al . Collaborative deep learning for super-resolving blurry text images [J ] . IEEE Transactions on Computational Imaging , 2020 , 6 : 778 - 790 .

CHEN J Y , YU H Y , MA J Q , et al . Text gestalt: Stroke-aware scene text image super-resolution [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2022 , 36 ( 1 ): 285 - 293 .

KARATZAS D , GOMEZ-BIGORDA L , NICOLAOU A , et al . ICDAR 2015 competition on robust reading [C ] // 2015 13th International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2015 : 1156 - 1160 .

RISNUMAWAN A , SHIVAKUMARA P , CHAN C S , et al . A robust arbitrary text detection system for natural scene images [J ] . Expert Systems with Applications , 2014 , 41 ( 18 ): 8027 - 8048 .

PHAN T Q , SHIVAKUMARA P , TIAN S X , et al . Recognizing text with perspective distortion in natural scenes‍ [C ] // 2013 IEEE International Conference on Computer Vision . Piscataway : IEEE , 2013 : 569 - 576 .

DU Y K , CHEN Z N , JIA C Y , et al . SVTR: Scene text recognition with a single visual model [EB/OL ] . ( 2022-04-30 )[ 2024-05-14 ] . https://arxiv.org/abs/2205.00159 https://arxiv.org/abs/2205.00159 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

End-to-End Scene Text Spotting Under Dual Domain Awareness Based on Multi-Party Synergetic Explicit Information

Image Super-Resolution Reconstruction Based on Lightweight Multi-Scale Channel Attention Network

Related Author

CHEN Ping-ping

LIN Hu

CHEN Hong-hui

XIE Zhao-peng

ZHOU Deng-wen

LI Wen-bin

LI Jin-xin

HUANG Zhi-yong

Related Institution

College of Physics and Information Engineering, Fuzhou University

School of Control and Computer Engineering， North China Electric Power University

College of Frontier Intersection, Hunan University of Technology and Business

Key Laboratory of Hunan Province for Statistical Learning and Intelligent Computation, Hunan University of Technology and Business

School of Computer Science, Hunan University of Technology and Business

⁰