电子学报 ›› 2022, Vol. 50 ›› Issue (7): 1609-1620.DOI: 10.12263/DZXB.20210611

• 学术论文 • 上一篇    下一篇

基于实景数据增强和双路径融合网络的实时街景语义分割算法

张志文, 刘天歌, 聂鹏举   

  1. 燕山大学信息科学与工程学院,河北 秦皇岛 066000
  • 收稿日期:2021-05-12 修回日期:2021-12-15 出版日期:2022-07-25 发布日期:2022-07-30
  • 作者简介:张志文 男,出生于1996年.硕士研究生.主要研究方向为机器学习、计算机视觉.E-mail: zhangzhiwen_ysu@yeah.net
    刘天歌 男,出生于1988年.博士. 副教授. 主要研究方向为机器学习、计算机视觉.E-mail: liutiange@ysu.edu.cn
    聂鹏举 男,出生于1995年. 硕士研究生.主要研究方向为机器学习、计算机视觉.E-mail: nie2764@stumail.ysu.edu.cn
  • 基金资助:
    国家自然科学基金青年项目(61802335)

Real-Time Semantic Segmentation for Road Scene Based on Data Enhancement and Dual-Path Fusion Network

ZHANG Zhi-wen, LIU Tian-ge, NIE Peng-ju   

  1. School of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066000,China
  • Received:2021-05-12 Revised:2021-12-15 Online:2022-07-25 Published:2022-07-30

摘要:

街景图像的分割在工业运用中具有十分重要的作用,但是街景图像具有种类繁多、光照多变等特点,此外,街景分割任务在追求准确性的同时要兼顾实时性,以上特点使得该任务具有很大的挑战性.本文针对这一挑战性任务提出了一个由空间路径和细节路径组成的双路径网络(Dual-path Fusion Network,DFNet),其中细节路径利用高分辨率的输入得到丰富的边界信息,空间路径利用细节路径产生的高质量特征图获得足够多的语义信息;网络的开始嵌入了一个可训练的图像预处理模块(Image Preprocessing Module,IPM),该模块可以使光照不同的图像进入网络正式训练之前在RGB通道上具有方差和均值的一致性;经过预处理模块之后的特征图会分别输入到细节路径和空间路径;本文提出了一个条状注意力细化模块(Attention Refinement Module,ARM),并将其放到空间路径的最后,可以将通道级信息和局部条状信息有效结合起来;在网络的最后,利用图像融合模块(Feature Fusion Module,FFM)对两条路径的特征信息进行融合,得到最后的分割结果.同时,本文还提出了一种基于小目标重组的“复制粘贴”数据增强方法,减弱了小目标样本数据不均衡的问题,同时扩充了数据集,该算法可以提升单个网络近2%的平均交并比(mIoU).本文利用所提算法在CityScapes和CamVid数据集上进行了实验验证,对于CityScapes数据集来说,输入大小为1 024×2 048,其每秒处理帧数(FPS)和mIoU分别达到了98和70.1%;对于CamVid数据集来说,输入大小为720×960,其FPS和mIoU分别达到了208和65.7%.与已有算法相比,本文算法的推理速度要优于最先进的实时街景语义分割算法,同时保持了较高的分割结果准确性,本文算法在街景图像语义分割速度和分割性能之间取得了良好的平衡.

关键词: 街景图像, 语义分割, 数据增强, 深度卷积神经网络

Abstract:

Semantic segmentation of road scene image plays a crucial role in industrial applications. However, challenges such as the great variety of target objects, high illumination variability in different scenes, and especially the increased requirement in speed and accuracy, make the segmentation of road scene images become difficult. To solve the above challenges, we propose an efficient convolutional neural network named dual-path fusion network(DFNet), consisting of spatial-path and detail-path. The spatial-path learns global information through low-resolution feature maps. Meanwhile, the detail-path can extract local details through high-resolution feature maps. DFNet starts with a trainable image preprocessing module(IPM), which is applied to unify the input images to have the same consistency of variance mean value on the RGB channel. Attention refinement module(ARM), which includes global pooling and strip pooling, is utilized in spatial-path to guide the feature learning while extracting the global features. After spatial-path and detail-path, a feature fusion module(FFM) is employed to effectively fuse the global and local detail features to achieve the final segmenting result. Besides the novel network DFNet, we propose a data augmentation strategy to enrich the training dataset and further solve the data imbalance issue of small objects. This straightforward “copy and paste” strategy can improve the performance of the same network by 2% in mIoU. We test our method on two public datasets, where it reaches FPS of 98 and mIoU of 70.1% on the CityScapes dataset(image size of 1 024×2 048), and FPS of 208 meantime mIoU of 65.7% on CamVid dataset (image size of 720×960). The experimental results show that our method achieves outperformance on speed as well as a competitive accuracy, compare to state-of-the-art methods. It also demonstrates that our approach can reach a good balance between speed and accuracy.

Key words: road scene image, semantic segmentation, data augmentation, deep convolutional neural network

中图分类号: