The algorithm of semantic segmentation based on pyramid convolution neural network has high accuracy
but it consumes a lot of computing resources
takes a long time to execute
and cannot meet the real-time requirements. To overcome these shortcomings
this paper made the following improvements: (1) replacing the original network structure with MobileNet in order to reduce the computation time and memory consumption; (2) using encoder-decoder structure to improve the resolution of the output image and further refine the segmentation results; (3) using a multi-level image input method
which can reduce the time consumed by network inference of high-resolution image. Our method was tested on the VOC 2012 dataset and the Cityscapes dataset compared with other state-of-the-art semantic segmentation models such as FCN (Fully Convolutional Networks)
SegNet
DeepLab
PSPNet and DFN (Discriminative Feature Network). Experimental results showed that our method achieved mIoU of 76.1% on the VOC 2012 dataset
and achieved mIOU of 74.1% on the Cityscapes dataset
which was a little lower than the traditional semantic segmentation algorithms. It took 18ms for our method to predict a 1024×512 picture
which achieved a balance between accuracy and computational resource consumption.