Deep Learning Based Scene Text Detection: A Survey
JIANG Wei1, ZHANG Chong-sheng2, YIN Xu-cheng3
1. School of Information Engineering, North China University of Water Resources and Electric Power, Zhengzhou, Henan 450045, China;
2. School of Computer and Information Engineering, Henan University, Kaifeng, Henan 475001, China;
3. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
Abstract:In recent years,deep learning based scene text detection have achieved significant progress.The paper reviews state-of-the-art methods in the field from 2014-2018.We categorize existing methods into traditional Region Proposal based method,Text Proposal Network method,segmentation based method and hybrid method based on Text Proposal Network and segmentation with detailed analysis of pros and cons for the four methods.Finally,we point out research trends and focuses in this field.
[1] 丁晓青,王言伟,等.文字识别原理方法和实践[M].北京:清华大学出版社,2017.1. Ding X Q,WANG Y W,et al.Character Recognition:Theories,Methods and Practice[M].Beijing:Tsinghua University Press,2007.1.(in Chinese)
[2] Zhang H,Zhao K,Song Y Z,et al.Text extraction from natural scene image:A survey[J].Neuro Computing,2013,122(51):310-323.
[3] Ye Q,Doermann D.Text detection and recognition in imagery:A survey[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,37(7):1480-1500.
[4] Yin X C,et al.Text detection,tracking and recognition in video:A comprehensive survey[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2016,25(6):2752-2773.
[5] Zhu Y Y,Yao C,Bai X.Scene text detection and recognition:recent advances and future trends[J].Frontiers of Computer Science,2016,10(1):19-36.
[6] Chen Xiangrong,Yuille A L.Detecting and reading text in natural scenes[A].IEEE International Conference on Computer Vision and Pattern Recognition[C].Washington DC:IEEE Computer Society,2004.366-373.
[7] Shehzad Muhammad Hanif,Lionel Prevost,Pablo Augusto Negri.A cascade detector for text detection in natural scene images[A].IEEE International Conference on Pattern Recognition[C].Tampa:IEEE Computer Society,2008.1-4.
[8] Shehzad Muhammad Hanif,Lionel Prevost.Text detection and localization in complex scene images using constrained adaBoost algorithm[A].IEEE International Conference on Document Analysis and Recognition[C].Barcelona:IEEE Computer Society,2009.1-5.
[9] Boris Epshtein,Eyal Ofek,Yonatan Wexler.Detecting text in natural scenes with stroke width transform[A].IEEE International Conference on Computer Vision and Pattern Recognition[C].San Francisco:IEEE Computer Society,2010.2963-2970.
[10] Yao Cong,Bai Xiang,et al.Detecting texts of arbitrary orientations in natural images[A].IEEE International Conference on Computer Vision and Pattern Recognition[C].Providence:IEEE Computer Society,2012.1083-1090.
[11] Neumann Lukas,Jiri Matas.Real-time scene text localization and recognition[A].IEEE International Conference on Computer Vision and Pattern Recognition[C].Providence:IEEE Computer Society,2012.3538-3545.
[12] Yin X C,Yin Xuwang,Huang Kaizhu.Robust text detection in natural scene images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,(99):2264-2268.
[13] Pan Y F,Hou X W,Liu C L.A robust system to detect and localize texts in natural scene images[A].The Eighth IAPR International Workshop on Document Analysis Systems[C].Nara:IEEE Computer Society,2008.35-42.
[14] Pan Y F,et al.A hybrid approach to detect and localize texts in natural scene images[J].IEEE Transactions On Image Processing,2011,20(3):800-813.
[15] Zhou Gang,Liu Yuehu.Scene text detection based on probability map and hierarchical model[J].Optical Engineering,2012,51(6):1-10.
[16] Hosang J,Dollar P,Dollar P,et al.What makes for effective detection proposals[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,38(4):814.
[17] Coates Adam,Blake Carpenter,et al.Text detection and character recognition in scene images with unsupervised feature learning[A].IEEE International Conference On Document Analysis And Recognition[C].Beijing,IEEE Computer Society,2011.440-445.
[18] Wang T,Wu D J,Coates A,et al.End-to-end text recognition with convolutional neural networks[A].International Conference on Pattern Recognition[C].Stockholm:IEEE Computer Society,2012.3304-3308.
[19] Jaderberg M,Vedaldi A,Zisserman A.Deep features for text spotting[A].European Conference on Computer Vision[C].Cham:Springer,2014.512-528.
[20] Jaderberg M,et al.Reading text in the wild with convolutional neural networks[J].International Journal of Computer Vision,2016,116(1):1-20.
[21] Zhang Z,Shen W,Yao C,et al.Symmetry-based text line detection in natural scenes[A].IEEE Conference on Computer Vision And Pattern Recognition[C].Boston:IEEE Computer Society,2015.2558-2567.
[22] Tian S,Pan Y,Huang C,et al.Text flow:A unified text detection system in natural scene images[A].IEEE International Conference on Computer Vision[C].Sydney:2016.4651-4659.
[23] Huang W,Qiao Y,Tang X.Robust scene text detection with convolution neural network induced MSER trees[A].European Conference on Computer Vision[C].Zurich:Springer,2014.497-511.
[24] He T,et al.Text-attentional convolutional neural network for scene text detection[J].IEEE Transactions on Image Processing,2016,25(6):2529-2541.
[25] Zhu A,Gao R,Uchida S.Could scene context be beneficial for scene text detection[J].Pattern Recognition,2016,58:204-215.
[26] Ma J,Wang W,Lu K,et al.Scene text detection based on pruning strategy of MSER-trees and Linkage-trees[A].IEEE International Conference on Multimedia and Expo[C].Hong Kong:IEEE Signal Processing Society,2017.367-372.
[27] Ren S,Girshick R,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137.
[28] Zhong Z,Jin L,Huang S.DeepText:A new approach for text proposal generation and text detection in natural images[A].IEEE International Conference on Acoustics,Speech and Signal Processing[C].New Orleans:IEEE Signal Processing Society,2017.1-18.
[29] Tian Z,Huang W,He T,et al.Detecting text in natural image with connectionist text proposal network[A].European Conference on Computer Vision[C].Cham:Springer,2016.56-72.
[30] Ma J,Shao W,Ye H,et al.Arbitrary-oriented scene text detection via rotation proposals[J].IEEE Transactions on Multimedia,2018,20(11):3111-3122.
[31] Liu Y,Jin L.Deep matching prior network:toward tighter multi-oriented text detection[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Hawaii:IEEE Computer Society,2017.3454-3461
[32] Liu Y,Jin L.Detecting curve text in the wild:new dataset and new solution[DB/OL].arXiv:1712.02170v1,2017.
[33] He K,Zhang X,Ren S,Sun J.Deep residual learning for image recognition[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas:IEEE Computer Society,2016.770-778.
[34] Liu X,et al.FOTS:Fast oriented text spotting with a unified network[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City:IEEE Computer Society,2018.5676-5685.
[35] Gupta A,Vedaldi A,Zisserman A.synthetic data for text localisation in natural images[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas:IEEE Computer Society,2016.2315-2324.
[36] Redmon J,et al.You only look once:unified,real-time object detection[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas:IEEE Computer Society,2016.779-788.
[37] Liao M,Shi B,Bai X.TextBoxes++:A single-shot oriented scene text detector[J].IEEE Transactions on Image Processing,2018,27(8):3676-3690.
[38] Shi B,Bai X,Belonge S.Detecting oriented text in natural images by linking segments[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Hawaii:IEEE Computer Society,2017.3482-3490.
[39] Liu W,Anguelov D,Erhan D,et al.SSD:single shot multi box detector[A].European Conference on Computer Vision[C].Cham:Springer,2016.21-37.
[40] Liao M H,Zhu Z,Shi B G,Xia G S,Bai X.Rotation-sensitive regression for oriented scene text detection[A].IEEE/CVF Conference on Computer Vision and Pattern Recognition[C].Salt Lake City:IEEE Computer Society,2018.5909-5918.
[41] Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[A].IEEE Computer Vision and Pattern Recognition[C].Boston:IEEE Computer Society,2015.3431-3440.
[42] Yao C,Bai X,Sang N,et al.Scene Text Detection Via Holistic,Multi-Channel Prediction[DB/OL].arXiv:1606.09002v2,2016.
[43] Polzounov A,Ablavatski A,Escalera S.Wordfence:Text detection in natural images with border awareness[A].IEEE International Conference on Image Processing[C].Beijing:IEEE Computer Society,2017.1222-1226.
[44] Xue C,Lu S,Zhan F.Accurate scene text detection through border semantics awareness and bootstrapping[A].European Conference on Computer Vision[C].Cham:Springer,2018.370-387.
[45] Long S,Ruan J,Zhang W,et al.TextSnake:A flexible representation for detecting text of arbitrary shapes[A].European Conference on Computer Vision[C].Cham:Springer,2018.19-35.
[46] Zhang Z,et al.Multi-oriented text detection with fully convolutional networks[A].IEEE Computer Vision and Pattern Recognition[C].Las Vegas:IEEE computer Society,2016.4159-4167.
[47] He T,Huang W,Qiao Y.Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network[DB/OL].arXiv:1603.09423v1,2016.
[48] Tang Y,Wu X.Scene text detection and segmentation based on cascaded convolution neural networks[J].IEEE Transactions on Image Processing,2017,26(3):1509-1520.
[49] He D,Yang X,Liang C,et al.Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild[A].IEEE Conference on Computer Vision and Pattern Recognition[C].Hawaii:IEEE Computer Society,2017.474-483.
[50] Deng D,Liu H,Li X,et al.PixelLink:detecting scene text via instance segmentation[A].AAAI Conference on Artificial Intelligence[C].New Orleands:the Association for the Advance of Artificial Intelligence,2018.
[51] Zhou X,Yao C,Wen H,et al.EAST:An efficient and accurate scene text detector[A].IEEE Computer Vision and Pattern Recognition[C].Hawaii:IEEE Computer Society,2017.2642-2651.
[52] Hong S,Roh B,Kim K H,et al.PVANet:lightweight Deep Neural Networks for Real-Time Object Detection[DB/OL].arXiv:1611.08588v2,2016.
[53] He W,Zhang X Y,Yin F,et al.Deep Direct Regression for Multi-Oriented Scene Text Detection[DB/OL].arXiv:1703.08289v1,2017.
[54] Qin S,Manduchi R.Cascaded segmentation-detection networks for word-level text spotting[A].Proceed of International Conference Document Analysis Recognion[C].Kyoto:IEEE Computer Society,2017.1275-1282.
[55] Lyu P,Yao C,Wu W,et al.Multi-oriented scene text detection via corner localization and region segmentation[A].IEEE Conference on Computer Vision and Pattcrn Recognition[C].Salt Lake City:IEEE Computer Society,2018.7553-7563.
[56] Lyu P,Liao M,Yao C,et al.Mask text spotter:An end-to-end trainable neural network for spotting text with arbitrary shapes[A].European Conference on Computer Vision[C].Cham:Springer,2018.71-88.
[57] Ch'ng C K,Chan C S.Total-Text:A comprehensive dataset for scene text detection and recognition[A].IAPR International Conference on Document Analysis and Recognition[C],Kyoto:IEEE Computer Society,2017.935-942.
[58] Karatzas D,Shafait F,et al.ICDAR 2013 Robust reading competition[A].IEEE International Conference on Document Analysis and Recognition[C].Washington DC:IEEE Computer Society,2013.1484-1493.
[59] ICDAR2013场景文字竞赛数据集[DB/OL].http://rrc.cvc.uab.es/?ch=2&com=downloads,2013.
[60] ICDAR2015场景文字竞赛数据集[DB/OL].http://rrc.cvc.uab.es/?ch=4&com=downloads,2015.
[61] MSRA-TD500场景文字数据集[DB/OL].http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500),2012.
[62] COCO-Text场景文字数据集[DB/OL].http://rrc.cvc.uab.es/?ch=5&com=tasks,2016.
[63] ICDAR 2017-MLT多语言文字数据集[DB/OL].http://rrc.cvc.uab.es/?ch=8&com=downloads,2017.
[64] RCTW场景中英文字数据集[DB/OL].http://mclab.eic.hust.edu.cn/icdar2017chinese/,2017.
[65] CTW场景中文数据集[DB/OL].https://ctwdataset.github.io/,2018.
[66] MTWI网络图像文字数据集[DB/OL].https://tianchi.aliyun.com/competition/entrance/231685/information,2018.
[67] ⅢT5K文字识别数据集[DB/OL].http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/ⅢT5K.html,2012.
[68] Neocr:Natural environment ocr dataset[DB/OL].http://www6.cs.fau.de/research/projects/pixtract/neocr/,2011.
[69] Oriented Scene Text Dataset[DB/OL].http://media-lab.engr.ccny.cuny.edu/~cyi/,2010.
[70] Multi-Orientation Scene Text Detection and USTB-SV1K Dataset[DB/OL],http://prir.ustb.edu.cn/TexStar/MOMV-text-detection/,2014.