Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation

BAI Xue-fei; XU Wen-jie; WANG Yuan-hui; WANG Wen-jian

doi:10.12263/DZXB.20250024

您当前的位置：

首页 >

文章列表页 >

Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation

Large-Scale Models and the Internet | 更新时间：2025-10-16

- Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation
- ACTA ELECTRONICA SINICA Vol. 53, Issue 6, Pages: 1741-1754(2025)
- 作者机构：
  
  1.山西大学计算机与信息技术学院，山西太原 030006
  2.山西大学计算智能与中文信息处理教育部重点实验室，山西太原 030006
- 作者简介：
- 基金信息：
  
  National Natural Science Foundation of China(U21A20513;62476157);Key Technologies Program of Taihang Laboratory in Shanxi Province(THYF-JSZX-24010200)
- DOI：10.12263/DZXB.20250024
  CLC： TP751;
- Received：07 January 2025，
  
  Revised：2025-03-19，
  
  Published：25 June 2025
- 稿件说明：
移动端阅览
白雪飞, 许文杰, 王渊辉, 等. 类感知对比学习的弱监督语义分割[J]. 电子学报, 2025, 53(06): 1741-1754.

BAI Xue-fei, XU Wen-jie, WANG Yuan-hui, et al. Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation[J]. Acta Electronica Sinica, 2025, 53(06): 1741-1754.
白雪飞, 许文杰, 王渊辉, 等. 类感知对比学习的弱监督语义分割[J]. 电子学报, 2025, 53(06): 1741-1754. DOI：10.12263/DZXB.20250024

BAI Xue-fei, XU Wen-jie, WANG Yuan-hui, et al. Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation[J]. Acta Electronica Sinica, 2025, 53(06): 1741-1754. DOI：10.12263/DZXB.20250024

摘要

图像级弱监督语义分割方法通常采用类激活图定位目标物体，但现有方法生成类激活图时存在目标区域激活不足或背景区域误激活等问题.文章提出了一种类感知对比学习的弱监督语义分割框架，通过融合文本提示与图像类别信息，提升模型对目标区域的精确定位能力.首先，文章分析了不同文本提示模板对各类别类激活图的影响，在此基础上，为了获取更具适应性的类别表示，本文构建了一个上下文提示集，并设计上下文提示动态选择策略，根据图像目标区域与文本提示之间的相似性获取最合适的上下文提示.其次，采用图像-文本对比学习方法，以增强模型在处理图像与文本语义对齐任务中的表现，并设计了对比损失函数监督模型的训练过程.最后，提出一个类别特定的背景抑制模块，抑制与目标类别紧密相关的背景区域的误激活，从而生成更加完整和紧凑的类激活图，实现更精确的语义分割.文章在通用数据集PASCAL VOC 2012和MS COCO 2014中对提出的模型进行实验验证，mIoU值分别达到71.9%和43.9%，性能优于所有对比方法，有效提升了弱监督语义分割精度.

Abstract

In image-level weakly supervised semantic segmentation (WSSS)

class activation map (CAM) are commonly used to localize object regions. However

existing methods often encounter challenges such as under-activation in object regions and erroneous activation in background regions when generating CAM. This paper proposes a class-aware contrastive learning (CA-CL) framework for weakly supervised semantic segmentation

which significantly enhances the model’s ability to accurately localize object regions by integrating text prompts and image category information. Firstly

we analyze the influence of different text prompt templates on the class activation maps of various categories

on this basis

to obtain more adaptive class representations

we construct a contextual prompt set and design a dynamic contextual prompt selection strategy. This strategy generates the most appropriate contextual prompts based on the similarity between image object regions and text prompts. Secondly

we adopt an image-text contrastive learning approach to enhance the model’s performance in aligning image and text semantics

and we design a contrastive loss function to guide the model training process. Finally

we introduce a class-specific background suppression module to mitigate erroneous activation in background regions closely related to object categories

thereby generating more complete and compact class activation maps and achieving more precise semantic segmentation. Experiments conducted on benchmark datasets PASCAL VOC 2012 and MS COCO 2014 demonstrate the effectiveness of the proposed framework

achieving mIoU values of 71.9% and 43.9%

respectively. The results demonstrate superior performance compared to existing methods

significantly improving the accuracy of weakly supervised semantic segmentation.

关键词

Keywords

references

KOLESNIKOV A , LAMPERT C H . Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation [M ] // Computer Vision-ECCV 2016 . Cham : Springer International Publishing , 2016 : 695 - 711 .

AHN J , KWAK S . Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 4981 - 4990 .

DAI J F , HE K M , SUN J . BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 1635 - 1643 .

PAPANDREOU G , CHEN L C , MURPHY K P , et al . Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation [C ] // 2015 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2015 : 1742 - 1750 .

LIN D , DAI J F , JIA J Y , et al . ScribbleSup: Scribble-supervised convolutional networks for semantic segmentation [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 3159 - 3167 .

VERNAZA P , CHANDRAKER M . Learning random-walk label propagation for weakly-supervised semantic segmentation [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 2953 - 2961 .

BEARMAN A , RUSSAKOVSKY O , FERRARI V , et al . What’s the Point: Semantic Segmentation with Point Supervision [M ] // Computer Vision-ECCV 2016 . Cham : Springer International Publishing , 2016 : 549 - 565 .

ZHOU B , KHOSLA A , LAPEDRIZA A , et al . Learning deep features for discriminative localization [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Los Alamitos : IEEE Computer Society Press , 2016 : 2921 - 2929 .

KRÄHENBÜHL P , KOLTUN V . Efficient inference in fully connected CRFs with Gaussian edge potentials [EB/OL ] . ( 2012-10-20 )[ 2025-02-01 ] . https://arxiv.org/abs/1210.5644v1 https://arxiv.org/abs/1210.5644v1 .

AHN J , CHO S , KWAK S . Weakly supervised learning of instance segmentation with inter-pixel relations [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 2209 - 2218 .

CHEN L C , PAPANDREOU G , KOKKINOS I , et al . Semantic image segmentation with deep convolutional nets and fully connected CRFs [J ] . Computer Science , 2014 ( 4 ): 357 - 361 .

CHEN L C , PAPANDREOU G , KOKKINOS I , et al . DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 4 ): 834 - 848 .

HUANG Z L , WANG X G , WANG J S , et al . Weakly-supervised semantic segmentation network with deep seeded region growing [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 7014 - 7023 .

WEI Y C , FENG J S , LIANG X D , et al . Object region mining with adversarial erasing: A simple classification to semantic segmentation approach [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 6488 - 6496 .

ZHANG D , ZHANG H W , TANG J H , et al . Causal intervention for weakly-supervised semantic segmentation [C ] // NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems . New York : ACM , 2020 : 655 - 666 .

WANG Y D , ZHANG J , KAN M N , et al . Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 12275 - 12284 .

RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [C ] // Proceedings of the International Conference on Machine Learning . Piscataway : IEEE , 2021 : 8748 - 8763 .

XIE J H , HOU X X , YE K , et al . CLIMS: Cross language image matching for weakly supervised semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4473 - 4482 .

MURUGESAN B , HUSSAIN R , BHATTACHARYA R , et al . Prompting classes: Exploring the power of prompt class learning in weakly supervised semantic segmentation [C ] // 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2024 : 290 - 301 .

LIN Y Q , CHEN M H , WANG W X , et al . CLIP is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 15305 - 15314 .

DENG S H , ZHUO W , XIE J H , et al . QA-CLIMS: Question-answer cross language image matching for weakly supervised semantic segmentation [C ] // Proceedings of the 31st ACM International Conference on Multimedia . New York : ACM , 2023 : 5572 - 5583 .

JANG S , YUN J , KWON J , et al . DIAL: Dense Image-text Alignment for Weakly Supervised Semantic Segmentation [M ] // Computer Vision-ECCV 2024 . Cham : Springer Nature Switzerland , 2024 : 248 - 266 .

EVERINGHAM M , VAN GOOL L , WILLIAMS C K I , et al . The pascal visual object classes (VOC) challenge [J ] . International Journal of Computer Vision , 2010 , 88 ( 2 ): 303 - 338 .

LIN T Y , MAIRE M , BELONGIE S , et al . Microsoft COCO: Common objects in context [M ] // Computer Vision-ECCV 2014 . Cham : Springer International Publishing , 2014 : 740 - 755 .

ZHANG F , GU C C , ZHANG C Y , et al . Complementary patch for weakly supervised semantic segmentation [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 7222 - 7231 .

LEE M , KIM D , SHIM H . Threshold matters in WSSS: Manipulating the activation for the robust and accurate segmentation model against thresholds [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4320 - 4329 .

CHEN L Y , LEI C Y , LI R H , et al . FPR: False positive rectification for weakly supervised semantic segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 1108 - 1118 .

JIANG P T , YANG Y Q , HOU Q B , et al . L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 16865 - 16875 .

ZHOU T F , ZHANG M J , ZHAO F , et al . Regional semantic contrast and aggregation for weakly supervised semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4289 - 4299 .

SIMONYAN K , VEDALDI A , ZISSERMAN A . Deep inside convolutional networks: Visualising image classification models and saliency maps [EB/OL ] . ( 2014-08-19 )[ 2025-02-01 ] . https://arxiv.org/abs/1312.6034 https://arxiv.org/abs/1312.6034 .

RU L X , ZHAN Y B , YU B S , et al . Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 16825 - 16834 .

RU L X , ZHENG H L , ZHAN Y B , et al . Token contrast for weakly-supervised semantic segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 3093 - 3102 .

WU W Y , DAI T H , CHEN Z , et al . APC: Adaptive patch contrast for weakly supervised semantic segmentation [EB/OL ] . ( 2024-07-15 )[ 2025-02-01 ] . https://arxiv.org/pdf/2407.10649.pdf https://arxiv.org/pdf/2407.10649.pdf .

LI J , LI D , XIONG C , et al . Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation [C ] // Proceedings of the International Conference on Machine Learning . Piscataway : IEEE , 2022 : 12888 - 12900 .

CHEN T , KORNBLITH S , NOROUZI M , et al . A simple framework for contrastive learning of visual representations [C ] // Proceedings of the International Conference on Machine Learning . Piscataway : IEEE , 2020 : 1597 - 1607 .

HE K M , FAN H Q , WU Y X , et al . Momentum contrast for unsupervised visual representation learning [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 9729 - 9738 .

KWON J , LEE E , CHO Y , et al . Learning to detour: Shortcut mitigating augmentation for weakly supervised semantic segmentation [C ] // 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2024 : 808 - 817 .

XIE J H , XIANG J F , CHEN J L , et al . C2 AM: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 979 - 988 .

YUAN K H , SCHAEFER G , LAI Y K , et al . A multi-strategy contrastive learning framework for weakly supervised semantic segmentation [J ] . Pattern Recognition , 2023 , 137 : 109298 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2016 : 770 - 778 .

ROUMELIOTIS K I , TSELIKAS N D . ChatGPT and open-AI models: A preliminary review [J ] . Future Internet , 2023 , 15 ( 6 ): 192 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

XU L , OUYANG W L , BENNAMOUN M , et al . Multi-class token transformer for weakly supervised semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4300 - 4309 .

LEE J , OH S J , YUN S , et al . Weakly supervised semantic segmentation using out-of-distribution data [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 16876 - 16885 .

WANG C W , XU R T , XU S B , et al . Treating pseudo-labels generation as image matting for weakly supervised semantic segmentation [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 755 - 765 .

WU W Y , DAI T H , HUANG X W , et al . Top-K pooling with patch contrastive learning for weakly-supervised semantic segmentation [C ] // 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC) . Piscataway : IEEE , 2024 : 5270 - 5275 .

RU L X , DU B , ZHAN Y B , et al . Weakly-supervised semantic segmentation with visual words learning and hybrid pooling [J ] . International Journal of Computer Vision , 2022 , 130 ( 4 ): 1127 - 1144 .

CHEN Q , YANG L X , LAI J H , et al . Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 4278 - 4288 .

CHEN Z Z , SUN Q R . Extracting class activation maps from non-discriminative features as well [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 3135 - 3144 .

LAI Q , VONG C M . Weakly-supervised semantic segmentation via dual-stream contrastive learning of cross-image contextual information [EB/OL ] . ( 2024-05-08 )[ 2025-02-01 ] . https://arxiv.org/pdf/2405.04913.pdf https://arxiv.org/pdf/2405.04913.pdf .

WU F W , HE J X , YIN Y F , et al . Masked collaborative contrast for weakly supervised semantic segmentation [C ] // 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2024 : 851 - 860 .

ZHAO X Q , TANG F L , WANG X Y , et al . SFC: Shared feature calibration in weakly supervised semantic segmentation [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 7 ): 7525 - 7533 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss

Neighborhood and Hypergraph Collaboration for Session-Based Recommendation

Construction and Analysis of Cross-Modal General Feature Space Driven by Prior Information

Unsupervised Domain Adaptive Person Re-Identification Based on Progressive Hybrid Contrastive Learning

Related Author

WANG Yuan-hui

XIE Xiao-yu

ZOU Shi-chen

HAO Wen-ning

LI Hao

WEN Jie-bin

CHEN Rong-yuan

HUANG Shao-nian

Related Institution

College of Command and Control Engineering, Army Engineering University of PLA

College of Frontier Intersection, Hunan University of Technology and Business

School of Computer Science, Hunan University of Technology and Business

Key Laboratory of Hunan Province for Statistical Learning and Intelligent Computation, Hunan University of Technology and Business

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

⁰