Abstract:Video object segmentation (VOS) is a research hotspot in the field of computer vision.Traditional VOS based on deep learning fine-tunes the deep network online,which leads to long time-consuming segmentation and is difficult to meet real-time requirements.Therefore,we propose a fast VOS method.First,the weight-shared siamese encoder subnet maps the reference stream and the target stream to the same feature space;so that the same objects have similar features.Then,the global feature extraction subnet matches the features similar to the given object to locate the object.Finally,the decoder subnet restores the object features and gets edge information by connecting the low-level features of target stream to output the mask.Experiments on public benchmark datasets show that our method improves the speed significantly and achieves good performance.
付利华, 赵宇, 孙晓威, 卢中山, 王丹, 杨寒雪. 基于孪生网络的快速视频目标分割[J]. 电子学报, 2020, 48(4): 625-630.
FU Li-hua, ZHAO Yu, SUN Xiao-wei, LU Zhong-shan, WANG Dan, YANG Han-xue. Fast Video Object Segmentation Based on Siamese Networks. Acta Electronica Sinica, 2020, 48(4): 625-630.
[1] Caelles S,Maninis K K,Pont-Tuset J,et al.One-shot video object segmentation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu,Hawaii,USA:IEEE,2017.5320-5329.
[2] Shin Yoon J,Rameau F,Kim J,et al.Pixel-level matching for video object segmentation using convolutional neural networks[A].Proceedings of the IEEE International Conference on Computer Vision[C].Venice,Italy:IEEE,2017.2167-2176.
[3] Voigtlaender P,Leibe B.Online adaptation of convolutional neural networks for video object segmentation[A].Proceedings of the British Machine Vision Conference[C].London,UK:BMVA,2017.1942-1958.
[4] Yang L,Wang Y,Xiong X,et al.Efficient video object segmentation via network modulation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City,USA:IEEE,2018.6499-6507.
[5] Chen Y,Pont-Tuset J,Montes A,et al.Blazingly fast video object segmentation with pixel-wise metric learning[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City,USA:IEEE,2018.1189-1198.
[6] Maninis K K,Caelles S,Chen Y,et al.Video object segmentation without temporal information[J].IEEE Transactions on PAMI,2018,41(6):1515-1530.
[7] Perazzi F,Khoreva A,Benenson R,et al.Learning video object segmentation from static images[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu,Hawaii,USA:IEEE,2017.3491-3500.
[8] Tsai Y H,Yang M H,Black M J.Video segmentation via object flow[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas,Nevada,USA:IEEE,2016.3899-3908.
[9] Jang W D,Kim C S.Online video object segmentation via convolutional trident network[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu,Hawaii,USA:IEEE,2017.7474-7483.
[10] Cheng J,Tsai Y H,Hung W C,et al.Fast and accurate online video object segmentation via tracking parts[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City,USA:IEEE,2018.7415-7424.
[11] Wug Oh S,Lee J Y,Sunkavalli K,et al.Fast video object segmentation by reference-guided mask propagation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City,USA:IEEE,2018.7376-7385.
[12] Bertinetto L,Valmadre J,Henriques J F,et al.Fully-convolutional siamese networks for object tracking[A].European Conference on Computer Vision[C].Amsterdam,Netherlands:Springer,2016.850-865.
[13] Valmadre J,Bertinetto L,Henriques J,et al.End-to-end representation learning for correlation filter based tracking[A].Proceedings of the IEEE Computer Vision and Pattern Recognition[C].Honolulu,Hawaii,USA:IEEE,2017.5000-5008.
[14] Zhao J,Cheng Y,Xu Y,et al.Towards pose invariant face recognition in the wild[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Salt Lake City,USA:IEEE,2018.2207-2216.
[15] Wang F,Kang L,Li Y.Sketch-based 3d shape retrieval using convolutional neural networks[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Boston,Massachusetts,USA:IEEE,2015.1875-1883.
[16] Yelamarthi S K,Reddy S K,Mishra A,et al.A zero-shot framework for sketch based image retrieval[A].European Conference on Computer Vision[C].Munich,Germany:Springer,2018.316-333.
[17] Deng J,Dong W,Socher R,et al.Imagenet:A large-scale hierarchical image database[A].Proceedings of the IEEE Computer Vision and Pattern Recognition[C].Miami,FL,USA:IEEE,2009.248-255.
[18] Perazzi F,Pont-Tuset J,McWilliams B,et al.A benchmark dataset and evaluation methodology for video object segmentation[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Las Vegas,Nevada,USA:IEEE,2016.724-732.
[19] Pont-Tuset J,Perazzi F,Caelles S,et al.The 2017 davis challenge on video object segmentation[J].arXiv preprint,2017,arXiv:1704.00675.
[20] Xu N,Yang L,Fan Y,et al.Youtube-VOS:Sequence-to-sequence video object segmentation[A].Proceedings of the European Conference on Computer Vision[C].Munich,Germany:Springer,2018.585-601.
[21] Lin T Y,Maire M,Belongie S,et al.Microsoft coco:Common objects in context[A].European Conference on Computer Vision[C].Zurich,Switzerland:Springer,2014.740-755.
[22] Chollet F.Xception:Deep learning with depthwise separable convolutions[A].Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition[C].Honolulu,Hawaii,USA:IEEE,2017.1800-1807.
[23] Chen L C,Papandreou G,Kokkinos I,et al.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs[J].IEEE Transactions on PAMI,2018,40(4):834-848.
[24] Maaten L,Hinton G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(10):2579-2605.