小目标特征增强图像分割算法

任莎莎; 刘琼

doi:10.12263/DZXB.20211123

您当前的位置：

首页 >

文章列表页 >

小目标特征增强图像分割算法

学术论文 | 更新时间：2025-12-08

- 小目标特征增强图像分割算法
- A Tiny Target Feature Enhancement Algorithm for Semantic Segmentation
- 电子学报 2022年50卷第8期页码：1894-1904
- 作者机构：
  
  华南理工大学校软件学院，广东广州 511436
- 作者简介：
  
  [ "任莎莎　女，1992年3月出生，安徽淮北人.现为华南理工大学软件学院在读博士研究生.主要研究方向为信号处理、图像理解与分割等方向.E-mail: 201910107240@mail.scut.edu.cn" ]
  [ "刘琼　女，1959年3月出生，云南昆明人.现为华南理工大学软件学院教授、博士生导师.承担、参加国家自然科学基金项目、国家863、973及地方政府项目10余项，国内外学术刊物和会议发表论文50余篇，中国专利5件.研究方向为计算机网络、计算机视觉、模式识别等." ]
- 基金信息：
  
  国家自然科学基金(61976094);广东省自然科学基金(2021A1515011349)
- DOI：10.12263/DZXB.20211123
  中图分类号： TP751
- 收稿：2021-08-18，
  
  修回：2022-06-22，
  
  纸质出版：2022-08-25
- 稿件说明：
移动端阅览
任莎莎,刘琼.小目标特征增强图像分割算法[J].电子学报,2022,50(08):1894-1904.

REN Sha-sha,LIU Qiong.A Tiny Target Feature Enhancement Algorithm for Semantic Segmentation[J].ACTA ELECTRONICA SINICA,2022,50(08):1894-1904.
任莎莎,刘琼.小目标特征增强图像分割算法[J].电子学报,2022,50(08):1894-1904. DOI： 10.12263/DZXB.20211123.

REN Sha-sha,LIU Qiong.A Tiny Target Feature Enhancement Algorithm for Semantic Segmentation[J].ACTA ELECTRONICA SINICA,2022,50(08):1894-1904. DOI： 10.12263/DZXB.20211123.

摘要

在图像场景分割中存在小目标易丢失，边缘轮廓噪声大等问题.在目前的增强特征表征能力与优化空间细节的语义分割算法中，由于边缘和小目标特征的丢失，导致小目标和边缘很难被准确分割.为此，本文研究了一种小目标特征增强的图像分割算法.首先设计一种像素空间注意力模块（Pixel spatial Attention Module，PAM），来获得空间像素具有较强语义信息的特征图像.然后通过对PAM的输出进行建模提取，分别获得含有语义类别信息的边缘特征和小目标特征.最后，将特定的损失函数应用到语义分割训练中，并将多种特征进行融合，经过反复的监督学习和训练校正，可以在不影响其他类别性能的情况下提高边缘和小目标分割的性能.在Cityscapes，VOC2012，ADE20K和Camvid基线数据集上的实验表明，该算法与先进的图像分割算法相比，在小目标分割、边缘特征增强和内轮廓噪声减少等方面，其性能和效果都有明显提高，分割精度提高了2个百分点.

Abstract

We have to face the challenge of missing small targets and severe edge noise in semantic segmentation. The existing semantic segmentation algorithms that enhance feature representation and optimize spatial details have difficulty to accurately segment the small targets and edges as the algorithms insufficiently gain detail information from tiny targets and semantic edges. This paper presents a tiny target feature enhancement algorithm for semantic segmentation. Specifically

a pixel spatial attention module(PAM) is designed to obtain strong semantic information from low-level pixel space. Semantic category information including edge features and tiny target features are obtained by modeling mask

respectively. A special loss function is designed for model training and the features gained by the model are fused with the features obtained from above way. Through edge feature enhancement

inner contour noise reduction

the segmentation performance of tiny target is improved while other segmentation categories are not degraded. Experimental results on Cityscapes

VOC2012

ADE20K and Camvid show that the proposed algorithm performance has been significantly improved by 2% in comparison with other state-of-the-art algorithms in the same scene.

关键词

Keywords

references

FARABET C , COUPRIE C , NAJMAN L , et al . Learning hierarchical features for scene labeling [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2012 , 35 ( 8 ): 1915 - 1929 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [EB/OL]. ( 2014-09-04 )[ 2022-06-15 ]. https://arxiv.org/abs/1409.1556 https://arxiv.org/abs/1409.1556 .

Al-QIZWINI M , BARJASTEH I , Al-QASSAB H , et al . Deep learning algorithm for autonomous driving using googlenet [C]// IEEE Intelligent Vehicles Symposium . Los Angeles : IEEE , 2017 : 89 - 96 .

HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C]// Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 770 - 778 .

HUANG G , LIU Z , MAATEN L VAN DER , et al . Densely connected convolutional networks [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 4700 - 4708 .

LIU S , De MELLO S , GU J , et al . Learning affinity via spatial propagation networks [C]// Neural Information Processing Systems . Long Beach : MIT Press , 2017 : 1520 - 1530 .

CHEN L C , PAPANDREOU G , KOKKINOS I , et al . Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 40 ( 4 ): 834 - 848 .

KRAHENBUHL P , KOLTUN V . Efficient inference in fully connected crfs with Gaussian edge potentials [J]. Advances in Neural Information Processing Systems , 2011 , 24 : 109 - 117 .

POHLEN T , HERMANS A , MATHIAS M , et al . Full-resolution residual networks for semantic segmentation in street scenes [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 4151 - 4160 .

GUO D , ZHU L , LU Y , et al . Tiny object sensitive segmentation of urban street scene with spatial adjacency between object classes [J]. IEEE Transactions on Image Processing , 2018 , 28 ( 6 ): 2643 - 2653 .

YANG Z , YU H , FENG M , et al . Tiny object augmentation of urban scenes for real-time semantic segmentation [J]. IEEE Transactions on Image Processing , 2020 , 29 : 5175 - 5190 .

CHANDRA S , KOKKINOS I . Fast exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs [C]// European Conference on Computer Vision . Netherlands : Springer , 2016 : 402 - 418 .

JAMPANI V , KIEFEL M , GEHLER P V . Learning sparse high dimensional filters: Image filtering, dense crfs and bilateral neural networks [C]// Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 4452 - 4461 .

ZHAO H , SHI J , QI X , et al . Pyramid scene parsing network [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 2881 - 2890 .

YU F , KOLTUN V . Multi-scale context aggregation by dilated convolutions [C]// International Conference on Learning Representations . San Diego : OpenReview.net , 2015 : 1 - 14 .

CHEN L C , ZHU Y , PAPANDREOU G , et al . Encoder-decoder with atrous separable convolution for semantic image segmentation [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 801 - 818 .

BERTASIUS G , SHI J , TORRESANI L . Semantic segmentation with boundary neural fields [C]// Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 3602 - 3610 .

CHENG D , MENG G , XIANG S , et al . Fusionnet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing , 2017 , 10 ( 12 ): 5769 - 5783 .

LIN G , MILAN A , SHEN C , et al . Refinenet: Multi-path refinement networks for high-resolution semantic segmentation [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 1925 - 1934 .

PENG C , ZHANG X , YU G , et al . Large kernel matters - improve semantic segmentation by global convolutional network [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 4353 - 4361 .

TAKIKAWA T , ACUNA D , JAMPANI V , et al . Gated-SCNN: Gated shape cnns for semantic segmentation [C]// International Conference on Computer Vision . South Korea : IEEE , 2019 : 5229 - 5238 .

DING H , JIANG X , LIU A , et al . Boundary-aware feature propagation for scene segmentation [C]// International Conference on Computer Vision . South Korea : IEEE , 2019 : 6819 - 6829 .

HOLSCHNEIDER M . A real-time algorithm for signal analysis with the help of the wavelet transform [J]. Wavelets , 1988 , 1 : 286 - 297 .

VAIDYANATHAN P P . Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial [J]. Proc IEEE , 1990 , 78 ( 1 ): 56 - 93 .

HUANG Z , WANG X , HUANG L , et al . Ccnet: Criss-cross attention for semantic segmentation [C]// International Conference on Computer Vision . South Korea : IEEE , 2019 : 603 - 612 .

LI X , ZHONG Z , WU J , et al . Expectation-maximization attention networks for semantic segmentation [C]// International Conference on Computer Vision . South Korea : IEEE , 2019 : 9167 - 9176 .

ZHONG Z , LIN Z Q , BIDART R , et al . Squeeze-and-attention networks for semantic segmentation [C]// Computer Vision and Pattern Recognition . Seattle : IEEE , 2020 : 13065 - 13074 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C]// Computer Vision and Pattern Recognition . Washington : IEEE , 2018 : 7132 - 7141 .

CHEN L , ZHANG H , XIAO J , et al . SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 5659 - 5667 .

ZHIDING Y , CHEN F , LIU M , et al . Casenet: Deep category-aware semantic edge detection [C]// Computer Vision and Pattern Recognition . Honolulu : IEEE , 2017 : 1761 - 1770 .

ACUNA D , KAR A , Fidler S . Devil is in the edges: Learning semantic boundaries from noisy annotations [C]// Computer Vision and Pattern Recognition . Long Beach : IEEE , 2019 : 11075 - 11083 .

PERAZZI F , PONT-TUSET J , McWilliams B , et al . A benchmark dataset and evaluation methodology for video object segmentation [C]// Computer Vision and Pattern Recognition . Las Vegas : IEEE , 2016 : 724 - 732 .

梁新宇 , 林洗坤 , 权冀川 , 肖铠鸿 . 基于深度学习的图像实例分割技术研究进展 [J]. 电子学报 , 2020 , 48 ( 12 ): 2476 - 2486 .

LIANG X , LIN X , QUAN Y , et al . Research on the progress of image instance segmentation based on deep learning [J]. Acta Electronica Sinica , 2020 , 48 ( 12 ): 2476 - 2486 . (in Chinese)

蔡超丽 , 李纯纯 , 黄琳 , 杨铁军 . ED-NAS: 基于神经网络架构搜索的陶瓷晶粒SEM图像分割方法 [J]. 电子学报 , 2022 , 50 ( 2 ): 461 - 469 .

CAI C , LI C , HUANG L , et al . ED-NAS: Ceramic grain segmentation based on neural architecture search using SEM images [J]. Acta Electronica Sinica , 2022 , 50 ( 2 ): 461 - 469 . (in Chinese)

MISHRA P , SARAWADEKAR K . Polynomial learning rate policy with warm restart for deep neural network [C]// IEEE Region 10 Conference . India : IEEE , 2019 : 2087 - 2092 .

IOFFE S , SZEGEDY C . Batch normalization: Accelerating deep network training by reducing internal covariate shift [C]// International Conference on Machine Learning . Lille : PMLR , 2015 : 448 - 456 .

LONG J , SHELHAMER E , DARRELL T . Fully convolutional networks for semantic segmentation [C]// Computer Vision and Pattern Recognition . Boston : IEEE , 2015 : 3431 - 3440 .

LI X , LI X , ZHANG L , et al . Improving semantic segmentation via decoupled body and edge supervision [C]// European Conference on Computer Vision . Glasgow : Springer , 2020 : 1 - 14 .

ZHAO H , ZHANG Y , LIU S , et al . PSANET: Point-wise spatial attention network for scene parsing [C]// European Conference on Computer Vision . Munich : Springer , 2018 : 267 - 283 .

FU J , LIU J , TIAN H , et al . Dual attention network for scene segmentation [C]// Computer Vision and Pattern Recognition . New York : IEEE , 2019 : 3146 - 3154 .

BOWEN C , ALEX S , ALEXANDER K . Per-pixel classification is not all you need for semantic segmentation [C]// Neural Information Processing Systems . Virtual Conference : MIT , 2021 : 1 - 12 .

ZAGORUYKO S , KOMODAKIS N . Wide residual networks(EB/OL) .( 2016-03-23 )[ 2022-06-15 ]. https://arxiv.org/abs/1605.07146 https://arxiv.org/abs/1605.07146 .

DAI J , QI H , XIONG Y , et al . Deformable convolutional networks [C]// International Conference on Computer Vision . Venice : IEEE , 2017 : 764 - 773 .

ZHU L , JI D , ZHU S , et al . Learning statistical texture for semantic segmentation [C]// Computer Vision and Pattern Recognition . Nashville : IEEE , 2021 : 12537 - 12546 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于Star-RTMPose的双目视觉定位与测量

基于RFID发现服务的一种供应链建模技术

基于Petri网与遗传算法的半导体生产线建模与优化调度

基于Verilog-AMS的数字射频电荷采样混频器的混合信号建模与仿真