RGB-D Scene Parsing Based on Spatial Structured Inference Deep Fusion Networks
WANG Ze-yu1, WU Yan-xia1, ZHANG Guo-yin1, BU Shu-hui2
1. College of Computer Science and Technology, Harbin Engineering University, Harbin, Heilongjiang 150001, China;
2. School of Aeronautics, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
Abstract:In order to make up the drawbacks that convolutional neural networks lack the ability of spatial structured learning in RGB-D scene parsing,we propose spatial structured inference deep fusion networks (SSIDFNs) on the basis of deep learning,the embedded structural inference layer organically combines conditional random fields (CRFs) and spatial structured inference model,which is able to learn the three-dimensional spatial distributions of objects and three-dimensional spatial relationships among objects in a more comprehensive and accurate way.Furthermore,the feature fusion layer takes both advantages of deep belief networks and improved CRFs,which is able to achieve deep structured learning according to the comprehensive semantic information of objects and semantic correlation in formation among objects.The experimental results demonstrate that the proposed SSIDFNs achieve the best mean accuracy 53.8% and 54.6% on the standard RGB-D datasets NYUDv2 and SUNRGBD respectively,which will be helpful to implement intelligent computer vision tasks,such as robot task planning and self-driving cars.
[1] Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Piscataway,NJ:IEEE Service Center,2015.3431-3440.
[2] Bojarski M,Del Testa D,Dworakowski D,et al.End to end learning for self-driving cars[J].Computer Science,2016,arXiv preprint arXiv:1604.07316.
[3] Sharma S,Kiros R,Salakhutdinov R.Action recognition using visual attention[J].Computer Science,2015,arXiv preprint arXiv:1511.04119.
[4] Kendall A,Badrinarayanan V,Cipolla R.Bayesian segnet:Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding[J].Computer Science,2015,arXiv preprint arXiv:1511.02680.
[5] Cadena C,Košecká J.Semantic segmentation with heterogeneous sensor coverages[A].Robotics and Automation[C].Piscataway,NJ:IEEE Service Center,2014.2639-2645.
[6] Bu S,Han P,Liu Z,et al.Scene parsing using inference embedded deep networks[J].Pattern Recognition,2016,59(11):188-198.
[7] Khan S H,Bennamoun M,Sohel F,et al.Integrating geometrical context for semantic labeling of indoor scenes using RGB-D images[J].International Journal of Computer Vision,2016,117(1):1-20.
[8] Wang Z,Wu Y,Zhang G,et al.Three-dimensional spatial structured encoding deep network for RGB-D scene parsing[J].Journal of Computer Applications,2017,37(12):3458-3466.
[9] Liang X,Shen X,Feng J,et al.Semantic object parsing with graph lstm[A].European Conference on Computer Vision[C].Berlin:Springer International Publishing,2016.125-143.
[10] Li Z,Gan Y,Liang X,et al.LSTM-CF:Unifying context modeling and fusion with LSTMs for RGB-D scene labeling[A].European Conference on Computer Vision[C].Berlin:Springer International Publishing,2016.541-557.
[11] Achanta R,Shaji A,Smith K,et al.SLIC superpixels compared to state-of-the-art superpixel methods[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(11):2274-2282.
[12] Hinton G E,Salakhutdinov R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[13] Silberman N,Hoiem D,Kohli P,et al.Indoor segmentation and support inference from RGB-D images[A].European Conference on Computer Vision[C].Berlin:Springer International Publishing,2012.746-760.
[14] Song S,Lichtenberg S P,Xiao J.Sun RGB-D:A RGB-D scene understanding benchmark suite[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Piscataway,NJ:IEEE Service Center,2015.567-576.
[15] Schmidt M.UGM:A Matlab Toolbox for Probabilistic Undirected Graphical Models[EB/OL].http://www.cs.ubc.ca/~schmidtm/Software/UGM.html,2017.
[16] Perceptron M.DeepLearning 0.1 Documentation[EB/OL].http://deeplearning.net/tutorial/,2014-11-06.