基于前景感知视觉注意的半监督视频目标分割

付利华; 赵宇; 姜涵煦; 赵茹; 吴会贤; 闫绍兴

doi:10.12263/DZXB.20201256

您当前的位置：

首页 >

文章列表页 >

基于前景感知视觉注意的半监督视频目标分割

学术论文 | 更新时间：2025-12-08

- 基于前景感知视觉注意的半监督视频目标分割
- Semi-Supervised Video Object Segmentation Based on Foreground Perception Visual Attention
- 电子学报 2022年50卷第1期页码：195-206
- 作者机构：
  
  1.北京工业大学信息学部，北京 100124
  2.北京航空航天大学计算机学院，北京 100191
- 作者简介：
  
  [ "付利华女. 1976年9月出生，四川安岳人. 2005年在西北工业大学计算机学院获得工学博士学位. 现为北京工业大学信息学部副教授.主要研究方向为智能信息处理、图像处理和计算机视觉. E-mail: fulh@bjut.edu.cn" ]
  [ "赵宇（通信作者）男. 1994年8月出生，河北唐山人. 2020年在北京工业大学信息学部获得工学硕士学位.现为北京航空航天大学计算机学院博士研究生.主要研究方向为图像处理和计算机视觉. E-mail: yzhao0812@foxmail.com" ]
- 基金信息：
  
  北京市自然科学基金(4173072)
- DOI：10.12263/DZXB.20201256
  中图分类号： TP391.41;
- 收稿：2020-11-06，
  
  修回：2021-01-20，
  
  纸质出版：2022-01-25
- 稿件说明：
移动端阅览
付利华,赵宇,姜涵煦等.基于前景感知视觉注意的半监督视频目标分割[J].电子学报,2022,50(01):195-206.

FU Li-hua,ZHAO Yu,JIANG Han-xu,et al.Semi-Supervised Video Object Segmentation Based on Foreground Perception Visual Attention[J].ACTA ELECTRONICA SINICA,2022,50(01):195-206.
付利华,赵宇,姜涵煦等.基于前景感知视觉注意的半监督视频目标分割[J].电子学报,2022,50(01):195-206. DOI： 10.12263/DZXB.20201256.

FU Li-hua,ZHAO Yu,JIANG Han-xu,et al.Semi-Supervised Video Object Segmentation Based on Foreground Perception Visual Attention[J].ACTA ELECTRONICA SINICA,2022,50(01):195-206. DOI： 10.12263/DZXB.20201256.

摘要

半监督视频目标分割是计算机视觉领域中的一个研究热点.传统半监督视频目标分割方法的网络模型缺乏对相似目标的判别力，且传统的掩码传播方式对模型的指导能力较弱.本文提出一种基于前景感知视觉注意的半监督视频目标分割方法.三流孪生编码器将输入图像映射到同一特征空间，使得相同目标具有相似的特征.基于前景感知的视觉注意将编码器输出的特征进行相似度匹配，并利用分割掩码突显前景特征，形成前景感知视觉注意，以此关注给定的分割目标，提升模型对待分割目标的判别力.基于残差细化的解码器采用残差学习的思想，融合当前帧图像的低阶特征，逐步改善分割细节.在公开基准数据集上的实验结果表明，本文方法可以较好地解决相似目标容易产生混淆等问题，并能较为准确地跟踪给定的分割目标.

Abstract

Semi-superised video object segmentation(SVOS) is a research hotspot in the field of computer vision. Most semi-supervised video object segmentation methods lack the ability to discriminate similar object

and the traditional mask propagation method is weak in guiding the model. This paper proposes a semi-supervised video object segmentation method based on foreground perception visual attention. The three-stream Siamese encoder maps the input frame to the same feature space

so that the same objects have similar features. Visual attention based on foreground perception calculates the similarity of encoder features and highlights the foreground through the mask

so as to focus on the given object and improve the model discrimination. The decoder based on residual refinement fuses the low-level features of the current frame to gradually improve the segmentation details. Experiments on public benchmark datasets show that the proposed method can deal with the similar confusion of the object and track the given object accurately.

关键词

Keywords

references

李瀚 , 刘坤华 , 刘嘉杰 , 等 . 实时视觉目标跟踪与视频对象分割多任务框架 [J]. 中国图象图形学报 , 2021 , 26 ( 1 ): 101 - 112 .

LI H , LIU K H , LIU J J , et al . Multitask framework for video object tracking and segmentation combined with multi-scale interframe information [J]. Journal of Image and Graphics , 2021 , 26 ( 1 ): 101 - 112 . (in Chinese)

付利华 , 赵宇 , 孙晓威 , 等 . 基于孪生网络的快速视频目标分割 [J]. 电子学报 , 2020 , 48 ( 4 ): 625 - 630 .

FU L H , ZHAO Y , SUN X W , et al . Fast video object segmentation based on siamese networks [J]. Acta Electronica Sinica , 2020 , 48 ( 4 ): 625 - 630 . (in Chinese)

PERAZZI F , KHOREVA A , BENENSON R , et al . Learning video object segmentation from static images [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 2663 - 2672 .

WUG OH S , LEE J Y , SUNKAVALLI K , et al . Fast video object segmentation by reference-guided mask propagation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 7376 - 7385 .

CHEN Y , PONT-TUSET J , MONTES A , et al . Blazingly fast video object segmentation with pixel-wise metric learning [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 1189 - 1198 .

MANINIS K K , CAELLES S , CHEN Y , et al . Video object segmentation without temporal information [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 41 ( 6 ): 1515 - 1530 .

VOIGTLAENDER P , LEIBE B . Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation [C]// The 2017 DAVIS Challenge on Video Object Segmentation-CVPR Workshops . Hawaii, USA : BMVC 2017 .

KHOREVA A , BENENSON R , ILG E , et al . Lucid data dreaming for object tracking [C]// The DAVIS Challenge on Video Object Segmentation-CVPR Workshops . Hawaii, USA : IEEE , 2017 : .

LI X , CHANGE LOY C . Video object segmentation with joint re-identification and attention-aware mask propagation [C]// Proceedings of the European Conference on Computer Vision . Munich : Springer, Germany , 2018 : 90 - 105 .

LUITEN J , VOIGTLAENDER P , LEIBE B . PReMVOS: Proposal-generation, refinement and merging for video object segmentation [C]// Asian Conference on Computer Vision . Perth, Australia : ACCV , 2018 : 565 - 580 .

HE K , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice, Italy : IEEE , 2017 : 2961 - 2969 .

ILG E , MAYER N , SAIKIA T , et al . Flownet 2.0: Evolution of optical flow estimation with deep networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 2462 - 2470 .

XIAO T , LI S , WANG B , et al . Joint detection and identification feature learning for person search [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 3415 - 3424 .

JAMPANI V , GADDE R , GEHLER P V . Video propagation networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 451 - 461 .

CHENG J , TSAI Y H , HUNG W C , et al . Fast and accurate online video object segmentation via tracking parts [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 7415 - 7424 .

YANG L , WANG Y , XIONG X , et al . Efficient video object segmentation via network modulation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 6499 - 6507 .

SUN J , YU D , LI Y , et al . Mask propagation network for video object segmentation [C]// The 2018 DAVIS Challenge on Video Object Segmentation-CVPR Workshops . Salt Lake City, USA : IEEE , 2018 : 1 - 4 .

JANG W D , KIM C S . Online video object segmentation via convolutional trident network [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Hawaii, USA : IEEE , 2017 : 5849 - 5858 .

HU P , WANG G , KONG X , et al . Motion-guided cascaded refinement network for video object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 1400 - 1409 .

SHIN YOON J , RAMEAU F , KIM J , et al . Pixel-level matching for video object segmentation using convolutional neural networks [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice, Italy : IEEE , 2017 : 2167 - 2176 .

HU Y T , HUANG J B , SCHWING A G . Videomatch: Matching based video object segmentation [C]// Proceedings of the European Conference on Computer Vision . Munich, Germany : Springer , 2018 : 54 - 70 .

VOIGTLAENDER P , CHAI Y , SCHROFF F , et al . Feelvos: fast end-to-end embedding learning for video object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 9481 - 9490 .

ZHUO T , CHENG Z , KANKANHALLI M . Fast video object segmentation via mask transfer network [J]. ( 2019-08-28 )[2021]. https://arxiv.org/abs/1908.10717 https://arxiv.org/abs/1908.10717 .

DENG J , DONG W , SOCHER R , et al . Imagenet: A large-scale hierarchical image database [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Florida, USA : IEEE , 2009 : 20 - 25 .

XIE S , TU Z . Holistically-nested edge detection [C]// Proceedings of the IEEE International Conference on Computer Vision . Santiago, Chile : IEEE , 2015 : 1395 - 1403 .

BERMAN M , RANNEN TRIKI A , BLASCHKO M B . The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE , 2018 : 4413 - 4421 .

PERAZZI F , PONT-TUSET J , MCWILLIAMS B , et al . A benchmark dataset and evaluation methodology for video object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE , 2016 : 724 - 732 .

PONT-TUSET J , PERAZZI F , CAELLES S , et al . The 2017 davis challenge on video object segmentation [OL]. ( 2018-05-27 )[2021]. http://arXiv: http://arXiv: 1704.00675 .

XU N , YANG L , FAN Y , ET AL . Youtube-vos: Sequence-to-sequence video object segmentation [C]// Proceedings of the European Conference on Computer Vision . Munich, Germany : Springer , 2018 : 585 - 601 .

XU K , WEN L , LI G , et al . Spatiotemporal cnn for video object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 1379 - 1388 .

VENTURA C , BELLVER M , GIRBAU A , et al . Rvos: End-to-end recurrent network for video object segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE , 2019 : 5277 - 5286 .

YIN Y , XU D , WANG X , et al . AGUnet: annotation-guided U-net for fast one-shot video object segmentation [J]. Pattern Recognition , 2021 , 110 : 107580 .

FU L , ZHAO Y , SUN X , et al . Video object segmentation based on motion-aware ROI prediction and adaptive reference updating [J]. Expert Systems with Applications , 2020 , 167 ( 4 ): 114153 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于孪生网络的快速视频目标分割

基于跨域结构保持投影的异构在线多源迁移学习方法

基于空间加权对数似然比相关滤波与Deep Snake的目标轮廓跟踪

基于孪生网络的电磁目标跨模式识别算法