电子学报 ›› 2020, Vol. 48 ›› Issue (9): 1769-1776.DOI: 10.3969/j.issn.0372-2112.2020.09.015

• 学术论文 • 上一篇    下一篇

一种基于改进的MobileNetV2网络语义分割算法

孟琭1, 徐磊1, 郭嘉阳2   

  1. 1. 东北大学信息科学与工程学院, 辽宁沈阳 110000;
    2. 辛辛那提大学电气工程与计算机系, 俄亥俄州辛辛那提 45221
  • 收稿日期:2019-09-10 修回日期:2019-12-27 出版日期:2020-09-25 发布日期:2020-09-25
  • 作者简介:孟琭 男,1982年2月出生于辽宁沈阳.现为东北大学信息科学与工程学院副教授.主要研究方向为人工智能、图像处理.E-mail:menglu@ise.neu.edu.cn
    徐磊 男,1997年11月出生于辽宁大连.研究生.主要研究方向为计算机视觉.E-mail:19s001019@stu.hit.edu.cn
    郭嘉阳 男,1985年生于福建厦门.现为美国辛辛那提大学电气工程与计算机系博士后,IEEE会员.主要研究方向为人工智能,机器学习,医学图像处理,信号处理等.E-mail:guojy@mail.uc.edu
  • 基金资助:
    国家自然科学基金(No.61973058);教育部中央高校基本科研基金(No.N2004020)

Semantic Segmentation Algorithm Based on Improved MobileNetV2

MENG Lu1, XU Lei1, GUO Jia-yang2   

  1. 1. College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning 110000, China;
    2. Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio 45221, USA
  • Received:2019-09-10 Revised:2019-12-27 Online:2020-09-25 Published:2020-09-25

摘要: 基于金字塔卷积神经网络的语义分割算法准确率很高,但是其计算资源消耗巨大、算法执行时间长、无法满足实时性要求.为了解决这个问题,本文做出了以下改进:(1)用MobileNet替换原网络的结构,减少了网络运算时间和内存开销;(2)引入编码器-解码器结构提高输出图像的分辨率,进一步细化分割结果;(3)针对高分辨率图像推断时间过长的问题,本文设计了多级图像输入方法,降低了网络推断高分辨率图像所消耗的时间.本文在VOC 2012数据集和Cityscapes数据集上进行了测试,并与FCN、SegNet、DeepLab、PSPNet以及DFN等语义分割模型对比.实验结果表明,本文设计的语义分割算法在VOC 2012数据集上达到了76.1%的mIoU,在Cityscapes数据集上达到了74.1%的mIoU,略低于传统语义分割算法;处理一张分辨率为1024×512的图片需要18ms,少于传统语义分割算法,满足了实时性要求,达到了准确率与计算资源消耗之间的平衡.

关键词: 语义分割, 卷积神经网络, 金字塔网络, 快速语义分割, MobileNet, 编码器-解码器

Abstract: The algorithm of semantic segmentation based on pyramid convolution neural network has high accuracy,but it consumes a lot of computing resources,takes a long time to execute,and cannot meet the real-time requirements.To overcome these shortcomings,this paper made the following improvements:(1) replacing the original network structure with MobileNet in order to reduce the computation time and memory consumption;(2) using encoder-decoder structure to improve the resolution of the output image and further refine the segmentation results;(3) using a multi-level image input method,which can reduce the time consumed by network inference of high-resolution image.Our method was tested on the VOC 2012 dataset and the Cityscapes dataset compared with other state-of-the-art semantic segmentation models such as FCN(Fully Convolutional Networks),SegNet,DeepLab,PSPNet and DFN(Discriminative Feature Network).Experimental results showed that our method achieved mIoU of 76.1% on the VOC 2012 dataset,and achieved mIOU of 74.1% on the Cityscapes dataset,which was a little lower than the traditional semantic segmentation algorithms.It took 18ms for our method to predict a 1024×512 picture,which achieved a balance between accuracy and computational resource consumption.

Key words: semantic segmentation, convolution neural network, pyramid network, fast semantic segmentation, MobileNet, encoder-decoder

中图分类号: