电子学报 ›› 2022, Vol. 50 ›› Issue (1): 195-206.DOI: 10.12263/DZXB.20201256
付利华1, 赵宇1,2, 姜涵煦1, 赵茹1, 吴会贤1, 闫绍兴1
收稿日期:
2020-11-06
修回日期:
2021-01-20
出版日期:
2022-01-25
作者简介:
基金资助:
FU Li-hua1, ZHAO Yu1,2, JIANG Han-xu1, ZHAO Ru1, WU Hui-xian1, YAN Shao-xing1
Received:
2020-11-06
Revised:
2021-01-20
Online:
2022-01-25
Published:
2022-01-25
Supported by:
摘要:
半监督视频目标分割是计算机视觉领域中的一个研究热点.传统半监督视频目标分割方法的网络模型缺乏对相似目标的判别力,且传统的掩码传播方式对模型的指导能力较弱.本文提出一种基于前景感知视觉注意的半监督视频目标分割方法.三流孪生编码器将输入图像映射到同一特征空间,使得相同目标具有相似的特征.基于前景感知的视觉注意将编码器输出的特征进行相似度匹配,并利用分割掩码突显前景特征,形成前景感知视觉注意,以此关注给定的分割目标,提升模型对待分割目标的判别力.基于残差细化的解码器采用残差学习的思想,融合当前帧图像的低阶特征,逐步改善分割细节.在公开基准数据集上的实验结果表明,本文方法可以较好地解决相似目标容易产生混淆等问题,并能较为准确地跟踪给定的分割目标.
中图分类号:
付利华, 赵宇, 姜涵煦, 等. 基于前景感知视觉注意的半监督视频目标分割[J]. 电子学报, 2022, 50(1): 195-206.
Li-hua FU, Yu ZHAO, Han-xu JIANG, et al. Semi-Supervised Video Object Segmentation Based on Foreground Perception Visual Attention[J]. Acta Electronica Sinica, 2022, 50(1): 195-206.
方法 | J&F↑ (%) | J Mean↑ (%) | F Mean↑ (%) | Time (s/桢)↓ | |
---|---|---|---|---|---|
在线微调 | OSVOS | 80.2 | 79.8 | 80.6 | 9 |
OnAVOS | 85.5 | 86.1 | 84.9 | 13 | |
MSK | 77.6 | 79.7 | 75.4 | 12 | |
STCNN | 83.8 | 83.8 | 83.8 | 3.9 | |
掩码传播 | OSMN | 73.5 | 74 | 72.9 | 0.13 |
FAVOS | 81 | 82.4 | 79.5 | 1.8 | |
RGMP | 81.8 | 81.5 | 82.0 | 0.13 | |
特征匹配 | PLM | 66.4 | 70.2 | 62.5 | 0.5 |
PML | 77.4 | 75.5 | 79.3 | 0.28 | |
VM | 81.0 | - | - | 0.32 | |
FEELVOS | 81.7 | 81.1 | 82.2 | 0.51 | |
MTN | 75.7 | 75.3 | 76.1 | 0.027 | |
AGUnet | 80.9 | 80.7 | 81.0 | 0.09 | |
MRARnet | 83.9 | 83.9 | 83.8 | 0.62 | |
本文方法 | 81.1 | 80.5 | 81.6 | 0.11 |
表1 不同视频目标分割方法在DAVIS-2016数据集的定量评估结果
方法 | J&F↑ (%) | J Mean↑ (%) | F Mean↑ (%) | Time (s/桢)↓ | |
---|---|---|---|---|---|
在线微调 | OSVOS | 80.2 | 79.8 | 80.6 | 9 |
OnAVOS | 85.5 | 86.1 | 84.9 | 13 | |
MSK | 77.6 | 79.7 | 75.4 | 12 | |
STCNN | 83.8 | 83.8 | 83.8 | 3.9 | |
掩码传播 | OSMN | 73.5 | 74 | 72.9 | 0.13 |
FAVOS | 81 | 82.4 | 79.5 | 1.8 | |
RGMP | 81.8 | 81.5 | 82.0 | 0.13 | |
特征匹配 | PLM | 66.4 | 70.2 | 62.5 | 0.5 |
PML | 77.4 | 75.5 | 79.3 | 0.28 | |
VM | 81.0 | - | - | 0.32 | |
FEELVOS | 81.7 | 81.1 | 82.2 | 0.51 | |
MTN | 75.7 | 75.3 | 76.1 | 0.027 | |
AGUnet | 80.9 | 80.7 | 81.0 | 0.09 | |
MRARnet | 83.9 | 83.9 | 83.8 | 0.62 | |
本文方法 | 81.1 | 80.5 | 81.6 | 0.11 |
方法 | J&F↑ (%) | J Mean↑ (%) | F Mean↑ (%) | Time (s/桢)↓ | |
---|---|---|---|---|---|
在线微调 | OSVOS | 60.3 | 56.6 | 63.9 | 9 |
OnAVOS | 65.4 | 61.6 | 69.1 | 13 | |
STCNN | 61.7 | 58.7 | 64.6 | 3.9 | |
掩码传播 | OSMN | 54.8 | 52.5 | 57.1 | 0.13 |
FAVOS | 58.2 | 54.6 | 61.8 | 1.8 | |
RGMP | 66.7 | 64.8 | 68.6 | 0.13 | |
RVOS | 60.6 | 57.5 | 63.6 | - | |
特征匹配 | PML | 57.2 | - | - | 0.28 |
VM | 56.6 | - | - | 0.32 | |
MTN | 54.2 | 49.4 | 59.0 | 0.048 | |
AGUnet | 64.1 | 60.9 | 67.2 | 0.18 | |
MRARnet | 63.4 | 61.3 | 65.4 | 0.63 | |
本文方法 | 62.1 | 61.5 | 62.8 | 0.11 |
表2 不同视频目标分割方法在DAVIS-2017数据集的定量评估结果
方法 | J&F↑ (%) | J Mean↑ (%) | F Mean↑ (%) | Time (s/桢)↓ | |
---|---|---|---|---|---|
在线微调 | OSVOS | 60.3 | 56.6 | 63.9 | 9 |
OnAVOS | 65.4 | 61.6 | 69.1 | 13 | |
STCNN | 61.7 | 58.7 | 64.6 | 3.9 | |
掩码传播 | OSMN | 54.8 | 52.5 | 57.1 | 0.13 |
FAVOS | 58.2 | 54.6 | 61.8 | 1.8 | |
RGMP | 66.7 | 64.8 | 68.6 | 0.13 | |
RVOS | 60.6 | 57.5 | 63.6 | - | |
特征匹配 | PML | 57.2 | - | - | 0.28 |
VM | 56.6 | - | - | 0.32 | |
MTN | 54.2 | 49.4 | 59.0 | 0.048 | |
AGUnet | 64.1 | 60.9 | 67.2 | 0.18 | |
MRARnet | 63.4 | 61.3 | 65.4 | 0.63 | |
本文方法 | 62.1 | 61.5 | 62.8 | 0.11 |
方法 | Overall | Seen | Unseen | ||
---|---|---|---|---|---|
G↑ | J ↑ | F ↑ | J ↑ | F ↑ | |
OSVOS | 58.8 | 59.8 | 60.5 | 54.2 | 60.7 |
OnAVOS | 55.2 | 60.1 | 62.7 | 46.6 | 51.4 |
RGMP | 53.8 | 59.5 | 45.2 | - | - |
OSMN | 51.2 | 60.0 | 60.1 | 40.6 | 44.0 |
RVOS | 56.8 | 63.6 | 67.2 | 45.5 | 51.0 |
本文方法 | 64.2 | 65.4 | 58.8 | 67.4 | 65.2 |
表3 不同视频目标分割方法在YouTube-VOS验证集的定量评估结果(%)
方法 | Overall | Seen | Unseen | ||
---|---|---|---|---|---|
G↑ | J ↑ | F ↑ | J ↑ | F ↑ | |
OSVOS | 58.8 | 59.8 | 60.5 | 54.2 | 60.7 |
OnAVOS | 55.2 | 60.1 | 62.7 | 46.6 | 51.4 |
RGMP | 53.8 | 59.5 | 45.2 | - | - |
OSMN | 51.2 | 60.0 | 60.1 | 40.6 | 44.0 |
RVOS | 56.8 | 63.6 | 67.2 | 45.5 | 51.0 |
本文方法 | 64.2 | 65.4 | 58.8 | 67.4 | 65.2 |
注意力机制 | J | ∆J |
---|---|---|
- Global | 52.4 | -9.1 |
- Local | 45.2 | -16.3 |
- ASPP | 53.8 | -7.7 |
- ReDecoder | 55.5 | -6.0 |
完整算法 | 61.5 | – |
表4 本文方法分阶段效果的定量分析(%)
注意力机制 | J | ∆J |
---|---|---|
- Global | 52.4 | -9.1 |
- Local | 45.2 | -16.3 |
- ASPP | 53.8 | -7.7 |
- ReDecoder | 55.5 | -6.0 |
完整算法 | 61.5 | – |
1 | 李瀚, 刘坤华, 刘嘉杰, 等. 实时视觉目标跟踪与视频对象分割多任务框架[J]. 中国图象图形学报, 2021, 26(1): 101-112. |
LIH, LIUK H, LIUJ J, et al. Multitask framework for video object tracking and segmentation combined with multi-scale interframe information[J]. Journal of Image and Graphics, 2021, 26(1): 101-112. (in Chinese) | |
2 | 付利华, 赵宇, 孙晓威, 等. 基于孪生网络的快速视频目标分割[J]. 电子学报, 2020, 48(4): 625-630. |
FUL H, ZHAOY, SUNX W, et al. Fast video object segmentation based on siamese networks[J]. Acta Electronica Sinica, 2020, 48(4): 625-630. (in Chinese) | |
3 | PERAZZIF, KHOREVAA, BENENSONR, et al. Learning video object segmentation from static images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA:IEEE, 2017: 2663-2672. |
4 | WUGOH S, LEEJ Y, SUNKAVALLIK, et al. Fast video object segmentation by reference-guided mask propagation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018: 7376-7385. |
5 | CHENY, PONT-TUSETJ, MONTESA, et al. Blazingly fast video object segmentation with pixel-wise metric learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018: 1189-1198. |
6 | MANINISK K, CAELLESS, CHENY, et al. Video object segmentation without temporal information[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(6): 1515-1530. |
7 | VOIGTLAENDERP, LEIBEB. Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation[C]//The 2017 DAVIS Challenge on Video Object Segmentation-CVPR Workshops. Hawaii, USA: BMVC2017. |
8 | KHOREVAA, BENENSONR, ILG E, et al. Lucid data dreaming for object tracking[C]//The DAVIS Challenge on Video Object Segmentation-CVPR Workshops. Hawaii, USA: IEEE, 2017: . |
9 | LIX, CHANGELOY C. Video object segmentation with joint re-identification and attention-aware mask propagation[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, Germany, 2018: 90-105. |
10 | LUITENJ, VOIGTLAENDERP, LEIBEB. PReMVOS: Proposal-generation, refinement and merging for video object segmentation[C]//Asian Conference on Computer Vision. Perth, Australia: ACCV, 2018: 565-580. |
11 | HEK, GKIOXARIG, DOLLÁRP, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2961-2969. |
12 | ILG E, MAYERN, SAIKIAT, et al. Flownet 2.0: Evolution of optical flow estimation with deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 2462-2470. |
13 | XIAOT, LIS, WANGB, et al. Joint detection and identification feature learning for person search[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 3415-3424. |
14 | JAMPANIV, GADDER, GEHLERP V. Video propagation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA:IEEE, 2017: 451-461. |
15 | CHENGJ, TSAIY H, HUNGW C, et al. Fast and accurate online video object segmentation via tracking parts[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 7415-7424. |
16 | YANGL, WANGY, XIONGX, et al. Efficient video object segmentation via network modulation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018: 6499-6507. |
17 | SUNJ, YUD, LIY, et al. Mask propagation network for video object segmentation[C]//The 2018 DAVIS Challenge on Video Object Segmentation-CVPR Workshops. Salt Lake City, USA: IEEE, 2018: 1-4. |
18 | JANGW D, KIMC S. Online video object segmentation via convolutional trident network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii, USA: IEEE, 2017: 5849-5858. |
19 | HUP, WANGG, KONGX, et al. Motion-guided cascaded refinement network for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 1400-1409. |
20 | SHIN YOONJ, RAMEAUF, KIMJ, et al. Pixel-level matching for video object segmentation using convolutional neural networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2167-2176. |
21 | HUY T, HUANGJ B, SCHWINGA G. Videomatch: Matching based video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer, 2018: 54-70. |
22 | VOIGTLAENDERP, CHAIY, SCHROFFF, et al. Feelvos: fast end-to-end embedding learning for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 9481-9490. |
23 | ZHUOT, CHENGZ, KANKANHALLIM. Fast video object segmentation via mask transfer network[J]. (2019-08-28)[2021]. . |
24 | DENGJ, DONGW, SOCHERR, et al. Imagenet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Florida, USA: IEEE, 2009: 20-25. |
25 | XIES, TUZ. Holistically-nested edge detection[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 1395-1403. |
26 | BERMANM, RANNEN TRIKIA, BLASCHKOM B. The lovász-softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE, 2018: 4413-4421. |
27 | PERAZZIF, PONT-TUSETJ, MCWILLIAMSB, et al. A benchmark dataset and evaluation methodology for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 724-732. |
28 | PONT-TUSETJ, PERAZZIF, CAELLESS, et al. The 2017 davis challenge on video object segmentation[OL]. (2018-05-27)[2021]. . |
29 | XUN, YANGL, FANY, ET AL. Youtube-vos: Sequence-to-sequence video object segmentation[C]//Proceedings of the European Conference on Computer Vision. Munich, Germany: Springer, 2018: 585-601. |
30 | XUK, WENL, LIG, et al. Spatiotemporal cnn for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 1379-1388. |
31 | VENTURAC, BELLVERM, GIRBAUA, et al. Rvos: End-to-end recurrent network for video object segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019: 5277-5286. |
32 | YINY, XUD, WANGX, et al. AGUnet: annotation-guided U-net for fast one-shot video object segmentation[J]. Pattern Recognition, 2021, 110: 107580. |
33 | FUL, ZHAOY, SUNX, et al. Video object segmentation based on motion-aware ROI prediction and adaptive reference updating[J]. Expert Systems with Applications, 2020, 167(4): 114153. |
[1] | 李豪, 袁广林, 秦晓燕, 琚长瑞, 朱虹. 基于空间加权对数似然比相关滤波与Deep Snake的目标轮廓跟踪[J]. 电子学报, 2023, 51(1): 105-116. |
[2] | 张伟, 王沙飞, 林静然, 利强, 邵怀宗. 基于孪生网络的电磁目标跨模式识别算法[J]. 电子学报, 2022, 50(6): 1281-1290. |
[3] | 付利华, 赵宇, 孙晓威, 卢中山, 王丹, 杨寒雪. 基于孪生网络的快速视频目标分割[J]. 电子学报, 2020, 48(4): 625-630. |
[4] | 钱晓亮, 白臻, 陈渊, 张鼎文, 史坤峰, 王芳, 吴青娥, 毋媛媛, 王慰. 协同视觉显著性检测方法综述[J]. 电子学报, 2019, 47(6): 1352-1365. |
[5] | 王昊, 马启明. 宽带子阵域特征空间稳健对角减载波束形成[J]. 电子学报, 2019, 47(3): 584-590. |
[6] | 陶剑文, 姚奇富. 稀疏特征空间嵌入正则化:鲁棒的半监督学习框架[J]. 电子学报, 2014, 42(11): 2198-2204. |
[7] | 王荔霞, 谢维信, 裴继红. 多高斯模型特征空间覆盖学习的海洋航摄图像分割[J]. 电子学报, 2014, 42(10): 2117-2122. |
[8] | 郭玉华, 常青美, 余道杰, 岳彩青. 一种改进的极化域-空域联合的自适应波束形成算法[J]. 电子学报, 2012, 40(6): 1279-1283. |
[9] | 董力赓;邸慧军;陶霖密;徐光. 一种基于动态贝叶斯网的视觉注意力识别方法[J]. 电子学报, 2011, 39(3A): 140-146. |
[10] | 王守觉;孙华;柳培忠;廖英豪;丁兴号;郭东辉. 基于仿生形象思维方法的图像检索算法[J]. 电子学报, 2010, 38(5): 993-997. |
[11] | 曾志强;吴 群;廖备水;高 济. 一种基于核SMOTE的非平衡数据集分类方法[J]. 电子学报, 2009, 37(11): 2489-2495. |
[12] | 王守觉;孙 华;莫华毅. 彩色图像特征空间变换的新算法及其应用[J]. 电子学报, 2007, 35(2): 193-196. |
[13] | 郎丛妍, 须德, 李兵. 一种基于模糊信息粒化的视频时空显著单元提取方法[J]. 电子学报, 2007, 35(10): 2023-2028. |
[14] | 许文龙, 蒋伟, 尚勇, 项海格. 一种基于子阵列合成的DOA估计算法[J]. 电子学报, 2006, 34(9): 1571-1577. |
[15] | 谭秀湖;刘国枝;王雪松. 一种基于特征空间分解的非对称鲁棒水印方法[J]. 电子学报, 2006, 34(11): 1981-1985. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||