Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network

ZHANG Zhi-wen; LIU Tian-ge; NIE Peng-ju

doi:10.12263/DZXB.20210611

您当前的位置：

首页 >

文章列表页 >

Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network

PAPERS | 更新时间：2025-12-08

- Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network
- ACTA ELECTRONICA SINICA Vol. 50, Issue 7, Pages: 1609-1620(2022)
- 作者机构：
  
  燕山大学信息科学与工程学院，河北秦皇岛 066000
- 作者简介：
- 基金信息：
- DOI：10.12263/DZXB.20210611
  CLC： TP391;
- Received：12 May 2021，
  
  Revised：2021-12-15，
  
  Published：25 July 2022
- 稿件说明：
移动端阅览
张志文,刘天歌,聂鹏举.基于实景数据增强和双路径融合网络的实时街景语义分割算法[J].电子学报,2022,50(07):1609-1620.

ZHANG Zhi-wen,LIU Tian-ge,NIE Peng-ju.Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network[J].ACTA ELECTRONICA SINICA,2022,50(07):1609-1620.
张志文,刘天歌,聂鹏举.基于实景数据增强和双路径融合网络的实时街景语义分割算法[J].电子学报,2022,50(07):1609-1620. DOI： 10.12263/DZXB.20210611.

ZHANG Zhi-wen,LIU Tian-ge,NIE Peng-ju.Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network[J].ACTA ELECTRONICA SINICA,2022,50(07):1609-1620. DOI： 10.12263/DZXB.20210611.

摘要

街景图像的分割在工业运用中具有十分重要的作用，但是街景图像具有种类繁多、光照多变等特点，此外，街景分割任务在追求准确性的同时要兼顾实时性，以上特点使得该任务具有很大的挑战性.本文针对这一挑战性任务提出了一个由空间路径和细节路径组成的双路径网络（Dual-path Fusion Network，DFNet），其中细节路径利用高分辨率的输入得到丰富的边界信息，空间路径利用细节路径产生的高质量特征图获得足够多的语义信息；网络的开始嵌入了一个可训练的图像预处理模块（Image Preprocessing Module，IPM），该模块可以使光照不同的图像进入网络正式训练之前在RGB通道上具有方差和均值的一致性；经过预处理模块之后的特征图会分别输入到细节路径和空间路径；本文提出了一个条状注意力细化模块（Attention Refinement Module，ARM），并将其放到空间路径的最后，可以将通道级信息和局部条状信息有效结合起来；在网络的最后，利用图像融合模块（Feature Fusion Module，FFM）对两条路径的特征信息进行融合，得到最后的分割结果.同时，本文还提出了一种基于小目标重组的“复制粘贴”数据增强方法，减弱了小目标样本数据不均衡的问题，同时扩充了数据集，该算法可以提升单个网络近2%的平均交并比（mIoU）.本文利用所提算法在CityScapes和CamVid数据集上进行了实验验证，对于CityScapes数据集来说，输入大小为1 024×2 048，其每秒处理帧数（FPS）和mIoU分别达到了98和70.1%；对于CamVid数据集来说，输入大小为720×960，其FPS和mIoU分别达到了208和65.7%.与已有算法相比，本文算法的推理速度要优于最先进的实时街景语义分割算法，同时保持了较高的分割结果准确性，本文算法在街景图像语义分割速度和分割性能之间取得了良好的平衡.

Abstract

Semantic segmentation of road scene image plays a crucial role in industrial applications. However

challenges such as the great variety of target objects

high illumination variability in different scenes

and especially the increased requirement in speed and accuracy

make the segmentation of road scene images become difficult. To solve the above challenges

we propose an efficient convolutional neural network named dual-path fusion network(DFNet)

consisting of spatial-path and detail-path. The spatial-path learns global information through low-resolution feature maps. Meanwhile

the detail-path can extract local details through high-resolution feature maps. DFNet starts with a trainable image preprocessing module(IPM)

which is applied to unify the input images to have the same consistency of variance mean value on the RGB channel. Attention refinement module(ARM)

which includes global pooling and strip pooling

is utilized in spatial-path to guide the feature learning while extracting the global features. After spatial-path and detail-path

a feature fusion module(FFM) is employed to effectively fuse the global and local detail features to achieve the final segmenting result. Besides the novel network DFNet

we propose a data augmentation strategy to enrich the training dataset and further solve the data imbalance issue of small objects. This straightforward “copy and paste” strategy can improve the performance of the same network by 2% in mIoU. We test our method on two public datasets

where it reaches FPS of 98 and mIoU of 70.1% on the CityScapes dataset(image size of 1 024×2 048)

and FPS of 208 meantime mIoU of 65.7% on CamVid dataset (image size of 720×960). The experimental results show that our method achieves outperformance on speed as well as a competitive accuracy

compare to state-of-the-art methods. It also demonstrates that our approach can reach a good balance between speed and accuracy.

关键词

Keywords

references

罗会兰 , 张云 . 基于深度网络的图像语义分割综述 [J]. 电子学报 , 2019 , 47 ( 10 ): 2211 - 2220 .

LUO H L , ZHANG Y . A survey of image semantic segmentation based on deep network [J]. Acta Electronica Sinica , 2019 , 47 ( 10 ): 2211 - 2220 . (in Chinese)

徐频捷 , 陈逸杰 , 李之南 . 基于事件驱动的车道线识别算法研究 [J]. 电子学报 , 2021 , 49 ( 7 ): 1379 - 1385 .

XU P J , CHEN Y J , LI Z N , et al . Research on event-driven lane recognition algorithms [J]. Acta Electronica Sinica , 2021 , 49 ( 7 ): 1379 - 1385 . (in Chinese)

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving? The KITTI vision benchmark suite [C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition . Providence, RI : IEEE , 2012 : 3354 - 3361 .

LONG J , SHELHAMER E , DARRELL T . Fully Convolutional Networks for Semantic Segmentation [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston, MA : IEEE , 2015 : 3431 - 3440 .

CHEN L C , PAPANDREOU G , KOKKINOS I , et al . DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 834 - 848 .

BADRINARAYANAN V , KENDALL A , CIPOLLA R . SegNet: A deep convolutional encoder-decoder architecture for Image segmentation [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 12 ): 2481 - 2495 .

NOH , H , HONG , S , HAN B . Learning deconvolution network for semantic segmentation [C]// 2015 IEEE International Conference on Computer Vision . Santiago : IEEE , 2015 : 1520 - 1528 .

WU Z F , SHEN C H , HENGEL A VAN DEN . Wider or deeper: Revisiting the ResNet model for visual recognition [J]. Pattern Recognition , 2019 , 90 : 119 - 133 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, NV : IEEE , 2016 : 770 - 778 .

ZHAO H S , SHI J P , QI X J , et al . Pyramid scene parsing network [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI : IEEE , 2017 : 6230 - 6239 .

ZHAO H S , QI X J , SHEN X Y , et al . ICNet for real-time semantic segmentation on high-resolution images [C].// European Conference on Computer Vision . Munich : Springer , 2018 : 418 - 434 .

ORSIC M , KRESO I , BEVANDIC P , et al . In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Long Beach, CA : IEEE , 2019 : 12599 - 12608 .

WU Z F , SHEN C H , HENGEL A V . Real-time semantic image segmentation via spatial sparsity [EB/OL]. ( 2017-10-01 )[2021]. https://arxiv.org/pdf/1712.00213.pdf https://arxiv.org/pdf/1712.00213.pdf .

孟琭 , 徐磊 , 郭嘉阳 . 一种基于改进的MobileNetV2网络语义分割算法 [J]. 电子学报 , 2020 , 48 ( 9 ): 1769 - 1776 .

MENG L , XU L , GUO J Y . Semantic segmentation algorithm based on improved MobileNetV2 [J]. Acta Electronica Sinica , 2020 , 48 ( 9 ): 1769 - 1776 . (in Chinese)

PASZKE A , CHAURASIA A , KIM S , et al . ENet: A deep neural network architecture for real-time semantic segmentation [EB/OL]. ( 2016-06-07 )[2021]. https://arxiv.org/pdf/1606.02147.pdf https://arxiv.org/pdf/1606.02147.pdf .

CHEN L C , PAPANDREOU G , SCHROFF F , et al . Rethinking atrous convolution for semantic image segmentation [EB/OL]. ( 2017-10-05 )[2021]. https://arxiv.org/pdf/1706.05587.pdf https://arxiv.org/pdf/1706.05587.pdf .

SZEGEDY C , LIU W , JIA Y Q , et al . Going deeper with convolutions [C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR) . Boston, MA : IEEE , 2015 : 1 - 9 .

IOFFE S , SZEGEDY C . Batch normalization: Accelerating deep network training by reducing internal covariate shift [EB/OL]. ( 2015-03-02 )[2021]. https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf .

SZEGEDY C , IOFFE S , VANHOUCKE V , et al . Inception-v4, Inception-ResNet and the impact of residual connections on learning [C]// AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence . San Francisco : AAAI Press , 2017 : 4278 - 4284 .

LI H C , XIONG P F , FAN H Q , et al . DFANet: Deep feature aggregation for real-time semantic segmentation [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Long Beach, CA : IEEE , 2019 : 9514 - 9523 .

YU C Q , GAO C X , WANG J B , et al . BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation [J]. International Journal of Computer Vision , 2021 , 129 ( 11 ): 3051 - 3068 .

NIRKIN Y , WOLF L , HASS NE R T . HyperSeg: Patch-wise hypernetwork for real-time semantic segmentation [C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Nashville, TN : IEEE , 2021 : 4061 - 4070 .

HU J , SHEN L , ALBANIE S , et al . Squeeze-and-excitation networks [J]. IEEE Transactions on Pattern Recognition and Machine Intelligence , 2020 , 42 ( 8 ): 2011 - 2023 .

HOU Q B , ZHANG L , CHENG M M , et al . Strip pooling: Rethinking spatial pooling for scene parsing [C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Seattle, WA : IEEE , 2020 : 4002 - 4011 .

DWIBEDI D , MISRA I , Cut HEBERT M. , paste and learn: Surprisingly easy synthesis for instance detection [C]// 2017 IEEE International Conference on Computer Vision . Venice : IEEE , 2017 : 1310 - 1319 .

DVORNIK N , MAIRAL J , SCHMID C . Modeling visual context is key to augmenting object detection datasets [M]// Computer Vision-ECCV 2018 . Cham : Springer International Publishing , 2018 : 375 - 391 .

YANG Z G , YU H S , FENG M T , et al . Small object augmentation of urban scenes for real-time semantic segmentation [J]. IEEE Transactions on Image Processing , 2020 , 29 : 5175 - 5190 .

KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks [J]. Communications of the ACM , 2017 , 60 ( 6 ): 84 - 90 .

KESKAR N S , SOCHER R . Improving generalization performance by switching from Adam to SGD [EB/OL]. ( 2017-10-20 )[2021]. https://arxiv.org/pdf/1712.07628.pdf https://arxiv.org/pdf/1712.07628.pdf .

CORDTS M , OMRAN M , RAMOS S , et al . The cityscapes dataset for semantic urban scene understanding [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR) . Las Vegas, NV : IEEE , 2016 : 3213 - 3223 .

BROSTOW G J , FAUQUEUR J , CIPOLLA R . Semantic object classes in video: A high-definition ground truth database [J]. Pattern Recognition Letters , 2009 , 30 ( 2 ): 88 - 97 .

MAZZINI D . Guided upsampling network for real-time semantic segmentation [EB/OL]. ( 2018-07-19 )[2021]. https://arxiv.org/pdf/1807.07466.pdf https://arxiv.org/pdf/1807.07466.pdf .

MEHTA S , RASTEGARI M , CASPI A , et al . ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 3567 - 3578 .

MEHTA S , RASTEGARI M , SHAPIRO L , et al . ESPNetv2: A light-weight, power efficient, and general purpose convolutional neural network [C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR) . Long Beach, CA : IEEE , 2019 : 9182 - 9192 .

ROMERA E , ALVAREZ J M , BERGASA L M , et al . ERFNet: Efficient residual factorized ConvNet for real-time semantic segmentation [J]. IEEE Transactions on Intelligent Transportation Systems , 2018 , 19 ( 1 ): 263 - 272 .

YU C Q , WANG J B , PENG C , et al . BiSeNet: Bilateral segmentation network for real-time semantic segmentation [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 334 - 349 .

YU F , KOLTUN V . Multi-scale context aggregation by dilated convolutions [EB/OL]. ( 2016-04-30 )[2021]. https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf .

CHEN L C , PAPANDREOUS G , KOKKINOS I , et al . DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 834 - 848 .

GHIASI G , FOWLKES C C . Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation [C]// European Conference on Computer Vision . Amsterdam : Springer , 2016 : 519 - 534 .

LIN G S , MILAN A , SHEN C H , et al . RefineNet: Multi-path refinement networks for high-resolution semantic segmentation [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, HI : IEEE , 2017 : 5168 - 5177 .

TREML M , ARJONA-MEDINA J , UNTERTHINER T , et al . Speeding up semantic segmentation for autonomous driving [C]// NIPS 2016-29th Conference on Neural Information Processing Systems . Barcelona : NIPS Workshop-mlits , 2016 , 123 : 312 - 331 .

LI G , YUN I , KIM J , et al . DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation [J]. CoRR , 2019 , abs/1907. 11357 . https://arxiv.org/pdf/1907.11357.pdf https://arxiv.org/pdf/1907.11357.pdf .

WANG P Q , CHEN P F , YUAN Y , et al . Understanding convolution for semantic segmentation [J]// 2018 IEEE Winter Conference on Applications of Computer Vision(WACV) . Lake Tahoe, NV: IEEE, 2018 : 1451 - 1460 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Ultrafast Ultrasound Color Blood Flow Imaging Based on the DCNN

DFRNet: A Semantic Segmentation Method Inspired with Physical Mechanism of Diffusion-Focus

A Semantic Segmentation Method of Embryo Image Based on Curriculum Learning

Related Author

Zhi-wen ZHANG

Tian-Ge LIU

Peng-Jv NIE

CUI Wang

HE Bing-bing

ZOU Liang-chen

WANG Ting-ting

LI Hai-yan

Related Institution

School of Information Science and Engineering， Yanshan University

Department of Electronic Engineering, School of Information, Yunnan University

School of Mathematics and Statistics, Hunan University of Technology and Business

School of Intelligent Engineering and Manufacturing, Hunan University of Technology and Business

Xiangjiang Laboratory

⁰