南京理工大学计算机与工程学院,江苏南京 210094
[ "杨维静 女,2000年出生,江苏泰州人. 南京理工大学计算机与工程学院硕士研究生. 主要研究方向为无监督语义分割.E-mail: yangweijing@njust.edu.cn" ]
[ "徐瑞 男,1999年出生,江苏邳州人. 南京理工大学计算机与工程学院硕士研究生. 主要研究方向为BEV感知、遥感图像目标检测.E-mail: 122106222759@njust.edu.cn" ]
[ "顾浩文 男,1998年出生,江苏南通人. 南京理工大学计算机与工程学院博士研究生. 主要研究方向为视频目标分割.E-mail: guhaowen@njust.edu.cn" ]
[ "陈涛 男,1993年出生,江苏苏州人. 南京理工大学计算机与工程学院博士后. 主要研究方向为计算机视觉、语义分割、弱监督学习.E-mail: taochen@njust.edu.cn" ]
[ "舒祥波 男,1986年出生,湖北孝感人. 南京理工大学计算机与工程学院博士生导师. 主要研究方向为计算机视觉、深度学习、模式识别、人工智能、机器学习、大数据.E-mail: shuxb@njust.edu.cn" ]
[ "姚亚洲 男,1987年出生,江苏连云港人. 南京理工大学计算机与工程学院博士生导师. 主要研究方向为计算机视觉、多媒体技术、机器学习.E-mail: yazhou.yao@njust.edu.cn" ]
收稿:2024-04-22,
修回:2024-09-21,
纸质出版:2025-03-25
移动端阅览
杨维静, 徐瑞, 顾浩文, 等. 基于伪标签去噪和SAM优化的大规模无监督语义分割[J]. 电子学报, 2025, 53(03): 716-727.
YANG Wei-jing, XU Rui, GU Hao-wen, et al. Pseudo-label Denoising and SAM Optimization for Large-scale Unsupervised Semantic Segmentation[J]. Acta Electronica Sinica, 2025, 53(03): 716-727.
杨维静, 徐瑞, 顾浩文, 等. 基于伪标签去噪和SAM优化的大规模无监督语义分割[J]. 电子学报, 2025, 53(03): 716-727. DOI:10.12263/DZXB.20240357
YANG Wei-jing, XU Rui, GU Hao-wen, et al. Pseudo-label Denoising and SAM Optimization for Large-scale Unsupervised Semantic Segmentation[J]. Acta Electronica Sinica, 2025, 53(03): 716-727. DOI:10.12263/DZXB.20240357
语义分割技术能够对复杂、多元的场景实现细粒度理解,是促进无人系统高效、智能工作的关键技术之一.大规模无监督语义分割旨在从大规模未标记图像中学习语义分割能力.然而,现有方法由于自学习伪标签存在类别混淆和形状表示欠佳的问题,导致最终分割精度较低.为此,本文提出一种伪标签去噪和SAM优化(Pseudo-label Denoising and SAM Optimization,PDSO)方法以解决大规模无监督语义分割问题.本文设计了一种基于去噪的特征微调模块,在基于小损失准则从大规模数据集中筛选出具有干净图像级伪标签的潜在样本后,利用这些干净样本对预训练的主干网络进行微调,使网络获得更稳健的类别表示.为了进一步减少伪标签中的类别噪声,设计了一种基于聚类的样本去噪模块,根据类别占比和样本与聚类中心之间的距离来去除干扰聚类任务的噪声样本,从而提升聚类性能.本文还设计了一种SAM提示优化模块,根据聚类距离识别出图像中的活跃类别,以过滤噪声目标,并将点和框作为SAM的目标提示信息,生成预期的目标掩膜以细化伪标签中目标的边缘.实验结果表明,在大规模语义分割数据集ImageNet-S
50
、ImageNet-S
300
和ImageNet-S
919
的测试集上,本文方法在平均交并比指标上分别达到了45.0%、26.6%和14.5%,显著提高了分割目标的类别准确率和边缘精度.
Semantic segmentation technology enables fine-grained understanding of complex and diverse scenes and is one of the key technologies to promote efficient and intelligent work of unmanned systems. Large-scale unsupervised semantic segmentation aims to learn semantic segmentation capabilities from a large number of unlabeled images. However
the existing approaches suffer heavily from their noisy self-learned pseudo-labels with poor category and shape representations
leading to low final segmentation accuracy. In this paper
we propose a Pseudo-label Denoising and SAM Optimization (PDSO) approach for large-scale unsupervised semantic segmentation to alleviate the problem mentioned above. Specifically
we first propose a denoising-based feature fine-tuning module
which fine-tunes the pre-trained backbone network with clean image-level pseudo-label samples selected from a large dataset based on a small loss criterion
enabling the network to obtain more robust category representations. To further reduce category noise in pseudo-labels
we propose a clustering-based sample denoising module to discard noisy samples that interfere with clustering based on the category
proportion and the distances between samples and cluster centers
thereby enhancing clustering performance. Moreover
we propose a SAM prompt optimization module
which identifies active categories in the image based on clustering distance to filter out noisy targets and uses points and boxes as SAM’s target prompt information to generate expected target masks and refine the edges of targets in pseudo-labels. Our proposed PDSO reaches the mIoU of 45.0%
26.6%
and 14.5% on the test set of ImageNet-S
50
ImageNet-S
300
and ImageNet-S
919
datasets
respectively
which significantly improves the category accuracy and edge accuracy of the segmented targets.
LI T J , LIU J , ZHANG W , et al . UAV-human: A large benchmark for human behavior understanding with unmanned aerial vehicles [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 16261 - 16270 .
RODRÍGUEZ A C , D'ARONCO S , SCHINDLER K , et al . Mapping oil palm density at country scale: An active learning approach [J ] . Remote Sensing of Environment , 2021 , 261 : 112479 .
KELLENBERGER B , MARCOS D , LOBRY S , et al . Half a percent of labels is enough: Efficient animal detection in UAV imagery using deep CNNs and active learning [J ] . IEEE Transactions on Geoscience and Remote Sensing , 2019 , 57 ( 12 ): 9524 - 9533 .
LENCZNER G , CHAN-HON-TONG A , LE SAUX B , et al . DIAL: Deep interactive and active learning for semantic segmentation in remote sensing [J ] . IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2022 , 15 : 3376 - 3389 .
CHENG Y W , XU H , LIU Y M . Robust small object detection on the water surface through fusion of camera and millimeter wave radar [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 15243 - 15252 .
LIN J Y , DIEKMANN P , FRAMING C E , et al . Maritime environment perception based on deep learning [J ] . IEEE Transactions on Intelligent Transportation Systems , 2022 , 23 ( 9 ): 15487 - 15497 .
LI J L , DAI H , HAN H , et al . MSeg3D: Multi-modal 3D semantic segmentation for autonomous driving [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 21694 - 21704 .
HE H Y , CAI J F , PAN Z Z , et al . Dynamic focus-aware positional queries for semantic segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 11299 - 11308 .
SEONG H S , MOON W , LEE S , et al . Leveraging hidden positives for unsupervised semantic segmentation [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2023 : 19540 - 19549 .
FU Y , WEI Y C , WANG G S , et al . Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 6111 - 6120 .
CARON M , BOJANOWSKI P , JOULIN A , et al . Deep clustering for unsupervised learning of visual features [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2018 : 139 - 156 .
LLOYD S . Least squares quantization in PCM [J ] . IEEE Transactions on Information Theory , 1982 , 28 ( 2 ): 129 - 137 .
GE Y , CHEN D , LI H . Mutual mean-teaching: Pseudo label refinery for unsupervised domain adaptation on person re-identification [C ] // International Conference on Learning Representations . Piscataway : IEEE , 2020 : 1 - 15 .
WANG X L , ZHANG R F , SHEN C H , et al . Dense contrastive learning for self-supervised visual pre-training [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 3023 - 3032 .
EVERINGHAM M , ESLAMI S M ALI , VAN GOOL L , et al . The pascal visual object classes challenge: A retrospective [J ] . International Journal of Computer Vision , 2015 , 111 ( 1 ): 98 - 136 .
GAO S H , LI Z Y , YANG M H , et al . Large-scale unsupervised semantic segmentation [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023 , 45 ( 6 ): 7457 - 7476 .
KIRILLOV A , MINTUN E , RAVI N , et al . Segment anything [C ] // 2023 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2023 : 3992 - 4003 .
HAN B , YAO Q M , YU X R , et al . Co-teaching: Robust training of deep neural networks with extremely noisy labels [EB/OL ] . ( 2018-04-18 )[ 2024-04-22 ] . https://arxiv.org/abs/1804.06872v3 https://arxiv.org/abs/1804.06872v3 .
KIM Y , KIM J M , AKATA Z , et al . Large loss matters in weakly supervised multi-label classification [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 14156 - 14165 .
SONG H , KIM M , LEE J G . Selfie: Refurbishing unclean samples for robust deep learning [C ] // International Conference on Machine Learning . Lille : PMLR , 2019 : 5907 - 5915 .
JI X , VEDALDI A , HENRIQUES J . Invariant information clustering for unsupervised image classification and segmentation [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 9865 - 9874 .
HYUN CHO J , MALL U , BALA K , et al . PiCIE: Unsupervised semantic segmentation using invariance and equivariance in clustering [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 16794 - 16804 .
HARB R , KNÖBELREITER P . InfoSeg: Unsupervised semantic image segmentation with mutual information maximization [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2021 : 18 - 32 .
CARON M , TOUVRON H , MISRA I , et al . Emerging properties in self-supervised vision transformers [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 9650 - 9660 .
YIN Z Y , WANG P C , WANG F , et al . TransFGU: A top-down approach to fine-grained unsupervised semantic segmentation [M ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2022 : 73 - 89 .
HAMILTON M , ZHANG Z , HARIHARAN B , et al . Unsupervised semantic segmentation by distilling feature correspondences [C ] // International Conference on Learning Representations . Piscataway : IEEE , 2022 : 1 - 26 .
RUSSAKOVSKY O , DENG J , SU H , et al . ImageNet large scale visual recognition challenge [J ] . International Journal of Computer Vision , 2015 , 115 ( 3 ): 211 - 252 .
GOLDBERGER J , BEN-REUVEN E . Training deep neural-networks using a noise adaptation layer [C ] // International Conference on Learning Representations . Piscataway : IEEE , 2022 : 1 - 9 .
PATRINI G , ROZZA A , MENON A K , et al . Making deep neural networks robust to label noise: A loss correction approach [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 1944 - 1952 .
CHEN W K , ZHU C , LI M T . Sample prior guided robust model learning to suppress noisy labels [M ] // Lecture Notes in Computer Science . Cham : Springer Nature Switzerland , 2023 : 3 - 19 .
XU Y L , CAO P , KONG Y Q , et al . L_DMI: An information-theoretic noise-robust loss function [EB/OL ] . ( 2019-09-08 )[ 2024-04-22 ] . https://arxiv.org/abs/1909.03388v2 https://arxiv.org/abs/1909.03388v2 .
ZHOU X , LIU X M , WANG C Y , et al . Learning with noisy labels via sparse regularization [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 72 - 81 .
JIANG L , ZHOU Z , LEUNG T , et al . Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels [C ] // International Conference on Machine Learning . New York : ACM , 2018 : 2304 - 2313 .
YANG L R , MENG F M , LI H L , et al . Learning with noisy class labels for instance segmentation [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2020 : 38 - 53 .
梁新宇 , 林洗坤 , 权冀川 , 等 . 基于深度学习的图像实例分割技术研究进展 [J ] . 电子学报 , 2020 , 48 ( 12 ): 2476 - 2486 .
LIANG X Y , LIN X K , QUAN J C , et al . Research on the progress of image instance segmentation based on deep learning [J ] . Acta Electronica Sinica , 2020 , 48 ( 12 ): 2476 - 2486 . (in Chinese)
蔡超丽 , 李纯纯 , 黄琳 , 等 . ED-NAS: 基于神经网络架构搜索的陶瓷晶粒SEM图像分割方法 [J ] . 电子学报 , 2022 , 50 ( 2 ): 461 - 469 .
CAI C L , LI C C , HUANG L , et al . ED-NAS: Ceramic grain segmentation based on neural architecture search using SEM images [J ] . Acta Electronica Sinica , 2022 , 50 ( 2 ): 461 - 469 . (in Chinese)
VAN GANSBEKE W , VANDENHENDE S , GEORGOULIS S , et al . Unsupervised semantic segmentation by contrasting object mask proposals [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 10052 - 10062 .
CHENG M M , MITRA N J , HUANG X L , et al . Global contrast based salient region detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2015 , 37 ( 3 ): 569 - 582 .
VAN DER MAATEN L , HINTON G . Visualizing data using t-SNE [J ] . Journal of Machine Learning Research , 2008 , 9 ( 11 ): 2579 - 2605 .
KRÄHENBÜHL P , KOLTUN V . Efficient inference in fully connected crfs with Gaussian edge potentials [C ] // NIPS'11: Proceedings of the 25th International Conference on Neural Information Processing Systems . New York : ACM , 2011 : 109 - 117 .
CHEN T , MAI Z , LI R , et al . Segment anything model (SAM) enhanced pseudo labels for weakly supervised semantic segmentation [C ] // Advances in Neural Information Processing Systems . New York : ACM , 2023 : 1 - 14 .
HARIHARAN B , ARBELAEZ P , BOURDEV L , et al . Semantic contours from inverse detectors [C ] // 2011 International Conference on Computer Vision . Piscataway : IEEE , 2011 : 991 - 998 .
SEITZER M , HORN M , ZADAIANCHUK A , et al . Bridging the gap to real-world object-centric learning [C ] // International Conference on Learning Representations . Piscataway : IEEE , 2023 : 1 - 43 .
HWANG J J , YU S , SHI J B , et al . SegSort: Segmentation by discriminative sorting of segments [C ] // 2019 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2019 : 7334 - 7344 .
MELAS-KYRIAZI L , RUPPRECHT C , LAINA I , et al . Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 8364 - 8375 .
ZIEGLER A , ASANO Y M . Self-supervised learning of object parts for semantic segmentation [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 14502 - 14511 .
KE T W , HWANG J J , GUO Y H , et al . Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2022 : 2571 - 2581 .
0
浏览量
11
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621