Fusing Deep Dilated Convolutions Network and Light-Weight Network for Object Detection
QUAN Yu1, LI Zhi-xin1, ZHANG Can-long1, MA Hui-fang1,2
1. Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, Guangxi 541004, China;
2. College of Computer Science and Engineering, Northwest Normal University, Lanzhou, Gansu 730070, China
Abstract:Object detection is an important research direction in the field of computer vision.In recent years, object detection has made great advances in public datasets, and there are also breakthroughs in algorithmic performance.In order to improve the accuracy and speed performance of two-stage object detection, this paper proposes a detection model based on transfer learning method that fuses the deep dilated convolutions network and the light-weight network.First, the dilated convolutions network is used to replace the convolutional residual module in the backbone network, namely deep dilated convolution network(D_dNet-65).Then, by compressing the pretrained feature map and adding an 81-class fully connected layer to replace the original two layers, namely light-weight network.Finally, the transfer learning method is introduced in the pretraining to optimize the model (D_dNet and light-weight network).The experiment was carried out on a typical data set, MSCOCO and VOC07.And the experiment shows that the method proposed in this paper has good effectiveness and scalability.
[1] 许新征,丁世飞,史忠植,等.图像分割的新理论和新方法[J].电子学报,2010,38 (2A):76-82. Xu X Z,Ding S F,Shi Z Z,et al.New theories and methods of image segmentation[J].Acta Electronica Sinica,2010,38(2A):76-82.(in Chinese)
[2] Erhan D,et al.Scalable object detection using deep neural networks[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2014.2147-2154.
[3] Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2014.580-587.
[4] Girshick R.Fast R-CNN[A].Proceedings of the IEEE International Conference on Computer Vision[C].Los Alamitos:IEEE Computer Society Press,2015.1440-1448.
[5] Ren S,He K,Girshick R,et al.Faster R-CNN:Towards real-time object detection with region proposal networks[A].Advances in Neural Information Processing Systems[C].Cambridge:MIT Press,2015.91-99.
[6] Dai J,Li Y,He K,et al.R-FCN:Object detection via region-based fully convolutional networks[A].Advances in Neural Information Processing Systems[C].Cambridge:MIT Press,2016.379-387.
[7] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2016.770-778.
[8] Lin T Y,et al.Feature pyramid networks for object detection[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2017.2117-2125.
[9] Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2015.3431-3440.
[10] Vinyals O,Toshev A,Bengio S,et al.Show and tell:Lessons learned from the 2015 MSCOCO image captioning challenge[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(4):652-663.
[11] Krizhevsky A,Sutskever I,Hinton G E.Imagenet classification with deep convolutional neural networks[A].Advances in Neural Information Processing Systems[C].Cambridge:MIT Press,2012.1097-1105.
[12] Deng Z,Li K,Zhao Q,et al.Effective face landmark localization via single deep network[OL].https://arxiv.org/abs/1702.02719,2017-02-09.
[13] Ghiasi G,Lin T Y,Le Q V.Nas-fpn:Learning scalable feature pyramid architecture for object detection[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2019.7036-7045.
[14] Pan S J,Yang Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[15] Li Z,Peng C,Yu G,et al.DetNet:A backbone network for object detection[DB/OL].https://arxiv.org/abs/1804.06215,2018-04-17.
[16] Shrivastava A,Sukthankar R,Malik J,et al.Beyond skip connections:Top-down modulation for object detection[DB/OL].https://arxiv.org/abs/1612.06851,2018-12-20.
[17] Oquab M,et al.Learning and transferring mid-level image representations using convolutional neural networks[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Los Alamitos:IEEE Computer Society Press,2014.1717-1724.
[18] He K,Gkioxari G,Dollár P,et al.Mask R-CNN[A].Proceedings of the IEEE International Conference on Computer Vision[C].Los Alamitos:IEEE Computer Society Press,2017.2980-2988.