Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss

LI Hao; HAO Wen-ning; ZOU Shi-chen; XIE Xiao-yu

doi:10.12263/DZXB.20250308

您当前的位置：

首页 >

文章列表页 >

Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss

PAPERS | 更新时间：2025-12-27

- Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss
- ACTA ELECTRONICA SINICA Vol. 53, Issue 9, Pages: 3384-3396(2025)
- 作者机构：
  
  中国人民解放军陆军工程大学指挥控制工程学院，江苏南京210007
- 作者简介：
- 基金信息：
  
  Defense Industrial Technology Development Program(JCKY2020601B018)
- DOI：10.12263/DZXB.20250308
  CLC： TP391.41;
- Received：21 April 2025，
  
  Accepted：16 September 2025，
  
  Published：25 September 2025
- 稿件说明：
移动端阅览
李豪, 郝文宁, 邹世辰, 等. 基于Diffusion-Mamba和尺度不变损失的渐进式图像生成方法[J]. 电子学报, 2025, 53(09): 3384-3396.

LI Hao, HAO Wen-ning, ZOU Shi-chen, et al. Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss[J]. Acta Electronica Sinica, 2025, 53(09): 3384-3396.
李豪, 郝文宁, 邹世辰, 等. 基于Diffusion-Mamba和尺度不变损失的渐进式图像生成方法[J]. 电子学报, 2025, 53(09): 3384-3396. DOI：10.12263/DZXB.20250308

LI Hao, HAO Wen-ning, ZOU Shi-chen, et al. Progressive Image Synthesis Method Based on Diffusion-Mamba and Scale-Invariant Loss[J]. Acta Electronica Sinica, 2025, 53(09): 3384-3396. DOI：10.12263/DZXB.20250308

摘要

扩散模型在图像生成领域由于精度高而受到了广泛关注，其骨干网络经历了从U-Net到Transformer的演变.然而，由于Transformer的运算量与序列长度的平方成正比这一特性，导致扩散模型在处理高分辨率图像时存在计算复杂度高的问题.为了解决这一问题，本文提出一种基于Diffusion-Mamba和尺度不变损失的渐进式图像生成方法.该方法利用多方向扫描机制和轻量级局部结构增强模块融合了Mamba的高效特性以及扩散模型的建模能力，并通过渐进式级联扩散过程实现了从低分辨率图像向高分辨率图像的高效转换.此外，设计基于对比学习的尺度不变损失函数，通过最大化同一目标在不同分辨率下的互信息，实现了跨尺度特征表示的对齐与增强.在ImageNet（FID = 1.67）数据集上的实验结果表明：本文方法取得了综合精度的提高，充分验证了该方法的有效性和高效性.

Abstract

Diffusion models have garnered significant attention in the field of image generation due to their high precision. The backbone networks of these models have evolved from U-Net to Transformer architectures. However

the computational complexity of Transformer-based models scales quadratically with sequence length

posing a substantial challenge for generating high-resolution images. To address this issue

we propose a novel progressive image synthesis method based on Diffusion-Mamba and scale-invariant loss. Our method leverages the efficient characteristics of Mamba and the powerful modeling capabilities of diffusion models by integrating multi-directional scanning mechanisms and lightweight local structure enhancement modules. It achieves an efficient transformation from low-resolution images to high-resolution images through a progressive cascaded diffusion process

significantly reducing computational complexity. Furthermore

we design a contrastive learning-based scale-invariant loss function that maximizes the mutual information of the same target across different resolutions

thereby aligning and enhancing cross-scale feature representations. Experimental results on the ImageNet (FID = 1.67) dataset demonstrate that our proposed method achieves comprehensive improvements in accuracy

effectively validating its efficacy and efficiency.

关键词

Keywords

references

何琨 , 佘计思 , 张子君 , 等 . 基于引导扩散模型的自然对抗补丁生成方法 [J ] . 电子学报 , 2024 , 52 ( 2 ): 564 - 573 .

HE K , SHE J S , ZHANG Z J , et al . A guided diffusion-based approach to natural adversarial patch generation [J ] . Acta Electronica Sinica , 2024 , 52 ( 2 ): 564 - 573 . (in Chinese)

牛玉贞 , 张凌昕 , 兰杰 , 等 . 基于分频式生成对抗网络的非成对水下图像增强 [J ] . 电子学报 , 2025 , 53 ( 2 ): 527 - 544 .

NIU Y Z , ZHANG L X , LAN J , et al . FD-GAN: Frequency-decomposed generative adversarial network for unpaired underwater image enhancement [J ] . Acta Electronica Sinica , 2025 , 53 ( 2 ): 527 - 544 . (in Chinese)

罗会兰 , 敖阳 , 袁璞 . 一种生成对抗网络用于图像修复的方法 [J ] . 电子学报 , 2020 , 48 ( 10 ): 1891 - 1898 .

LUO H L , AO Y , YUAN P . Image inpainting using generative adversarial networks [J ] . Acta Electronica Sinica , 2020 , 48 ( 10 ): 1891 - 1898 . (in Chinese)

GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial Nets [C ] // The 27th Advances in Neural Information Processing Systems . New York : ACM , 2014 : 2672 - 2680 .

SOHL-DICKSTEIN J , WEISS E A , MAHESWARANATHAN N , et al . Deep unsupervised learning using nonequilibrium thermodynamics [C ] // The 32nd International Conference on Machine Learning . Cambridge : PMLR , 2015 : 2246 - 2255 .

VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // The 31st Advances in Neural Information Processing Systems . New York : ACM , 2017 : 5998 - 6008 .

ROMBACH R , BLATTMANN A , LORENZ D , et al . High-resolution image synthesis with latent diffusion models [C ] // 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2022 : 10674 - 10685 .

BAO F , NIE S , XUE K W , et al . All are worth words: A ViT backbone for diffusion models [C ] // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2023 : 22669 - 22679 .

SMITH J T H . Advancing Sequence Modeling with Deep State Space Methods [D ] . Stanford : Stanford University , 2024 .

KALMAN R E . A new approach to linear filtering and prediction problems [J ] . Journal of Basic Engineering , 1960 , 82 ( 1 ): 35 - 45 .

DAO T , GU A . Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality [EB/OL ] . ( 2024-05-31 )[ 2025-09-10 ] . https://arXiv.org/abs/2405.21060 https://arXiv.org/abs/2405.21060 .

ZHU L H , LIAO B C , ZHANG Q , et al . Vision Mamba: Efficient visual representation learning with bidirectional state space model [EB/OL ] . ( 2024-11-14 )[ 2025-09-10 ] . https://arXiv.org/abs/2401.09417 https://arXiv.org/abs/2401.09417 .

HO J , JAIN A , ABBEEL P . Denoising diffusion probabilistic models [C ] // The 34th Advances in Neural Information Processing Systems . New York : ACM , 2020 : 6840 - 6851 .

YANG X L , SHIH S M , FU Y L , et al . Your ViT is secretly a hybrid discriminative-generative diffusion model [EB/OL ] . ( 2022-08-16 )[ 2025-09-10 ] . https://arXiv.org/abs/2208.07791 https://arXiv.org/abs/2208.07791 .

PEEBLES W , XIE S N . Scalable diffusion models with transformers [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2024 : 4172 - 4182 .

HATAMIZADEH A , SONG J M , LIU G L , et al . DiffiT: Diffusion vision transformers for image generation [C ] // Computer Vision - ECCV 2024 . Cham : Springer , 2025 : 37 - 55 .

TENG J Y , ZHENG W D , DING M , et al . Relay diffusion: Unifying diffusion process across resolutions for image synthesis [EB/OL ] . ( 2023-09-04 )[ 2025-09-09 ] . https://arXiv.org/abs/2309.03350 https://arXiv.org/abs/2309.03350 .

YAN J N , GU J T , RUSH A M . Diffusion models without attention [C ] // 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2024 : 8239 - 8249 .

FAN M , YU C , HUANG J . Scalable diffusion models with state space backbone [EB/OL ] . ( 2024-03-28 )[ 2025-09-10 ] . https://arxiv.org/abs/2402.05608 https://arxiv.org/abs/2402.05608 .

HU V T , BAUMANN S A , GUI M , et al . ZigMa: A DiT-style zigzag mamba diffusion model [C ] // Computer Vision - ECCV 2024 . Cham : Springer , 2025 : 148 - 166 .

PARK J , PARK J , XIONG Z Y , et al . Can Mamba learn how to learn a comparative study on in-context learning tasks [C ] // Proceedings of the 41st International Conference on Machine Learning . New York : ACM , 2024 : 39793 - 39812 .

TENG Y , WU Y , SHI H , et al . DiM: Diffusion mamba for efficient high-resolution image synthesis [EB/OL ] . ( 2024-07-10 )[ 2025-09-09 ] . https://arXiv.org/abs/2405.14224 https://arXiv.org/abs/2405.14224 .

刘少鹏 , 赵慧民 , 洪佳明 , 等 . 面向医学图像生成的鲁棒条件生成对抗网络 [J ] . 电子学报 , 2023 , 51 ( 2 ): 427 - 437 .

LIU S P , ZHAO H M , HONG J M , et al . Medical image synthesis using robust conditional GAN [J ] . Acta Electronica Sinica , 2023 , 51 ( 2 ): 427 - 437 . (in Chinese)

马宾 , 王一利 , 徐健 , 等 . 基于双向生成对抗网络的图像感知哈希算法 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1405 - 1412 .

MA B , WANG Y L , XU J , et al . An image perceptual hash algorithm based on bidirectional generative adversarial network [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1405 - 1412 . (in Chinese)

黄欣研 , 刘芳 , 鲍骞月 , 等 . 基于多任务学习和身份约束的生成对抗网络人脸校正识别方法 [J ] . 电子学报 , 2023 , 51 ( 10 ): 2936 - 2949 .

HUANG X Y , LIU F , BAO Q Y , et al . Multi-task learning and identity-constrained generative adversarial network for face frontalization and recognition [J ] . Acta Electronica Sinica , 2023 , 51 ( 10 ): 2936 - 2949 . (in Chinese)

SHANNON C E . A mathematical theory of communication [J ] . The Bell System Technical Journal , 1948 , 27 ( 3 ): 379 - 423 .

MACKAY D J C . Information Theory, Inference, and Learning Algorithms [M ] . Cambridge : Cambridge University Press , 2003 : 1 - 628 .

POOLE B , OZAIR S , VAN DEN OORD A , et al . On variational bounds of mutual information [C ] // The 36th International Conference on Machine Learning . Cambridge : PMLR , 2019 : 2412 - 2421 .

LI Y X , LIU M Y , WU Y , et al . Learning adaptive and view-invariant vision transformer for real-time UAV tracking [EB/OL ] . ( 2025-08-15 0 )[ 2025-09-09 ] . https://arxiv.org/abs/2412.20002 https://arxiv.org/abs/2412.20002 .

HJELM R D , FEDOROV A , LAVOIE-MARCHILDON S , et al . Learning deep representations by mutual information estimation and maximization [EB/OL ] . ( 2019-02-22 )[ 2025-09-09 ] . https://arXiv.org/abs/1808.06670 https://arXiv.org/abs/1808.06670 .

DENG J , DONG W , SOCHER R , et al . ImageNet: A large-scale hierarchical image database [C ] // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2009 : 248 - 255 .

HEUSEL M , RAMSAUER H , UNTERTHINER T , et al . GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . New York : ACM , 2017 : 6629 - 6640 .

NASH C , MENICK J , DIELEMAN S . Generating images with sparse representations [EB/OL ] . ( 2021-03-05 )[ 2025-09-09 ] . https://arxiv.org/abs/2103.03841 https://arxiv.org/abs/2103.03841 .

SALIMANS T , GOODFELLOW I , ZAREMBA W , et al . Improved techniques for training GANs [C ] // Proceedings of the 30th International Conference on Neural Information Processing Systems . New York : ACM , 2016 : 2234 - 2242 .

KYNKÄÄNNIEMI T , KARRAS T , LAINE S , et al . Improved precision and recall metric for assessing generative models [C ] // Proceedings of the 32th Advances in Neural Information Processing Systems . New York : ACM , 2019 : 3929 - 3938

HO J , SALIMANS T . Classifier-free diffusion guidance [EB/OL ] . ( 2022-07-26 )[ 2025-09-09 ] . https://arxiv.org/abs/2207.12598 https://arxiv.org/abs/2207.12598 .

BROCK A , DONAHUE J , SIMONYAN K . Large scale GAN training for high fidelity natural image synthesis [EB/OL ] . ( 2019-02-25 )[ 2025-09-09 ] . https://arXiv.org/abs/1809.11096 https://arXiv.org/abs/1809.11096 .

SAUER A , SCHWARZ K , GEIGER A . StyleGAN-XL: Scaling StyleGAN to large diverse datasets [C ] // ACM SIGGRAPH 2022 Conference Proceedings . New York : ACM , 2022 : 1 - 10 .

DHARIWAL P , NICHOL A . Diffusion models beat GANs on image synthesis [C ] // Proceedings of the 35th International Conference on Neural Information Processing Systems . New York : ACM , 2021 : 8780 - 8794 .

GAO S H , ZHOU P , CHENG M M , et al . Masked diffusion Transformer is a strong image synthesizer [C ] // 2023 IEEE/CVF International Conference on Computer Vision . Piscataway : IEEE , 2024 : 23107 - 23116 .

HO J , SAHARIA C , CHAN W , et al . Cascaded diffusion models for high fidelity image generation [J ] . Journal of Machine Learning Research , 2022 , 23 ( 1 ): 2249 - 2281 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Neighborhood and Hypergraph Collaboration for Session-Based Recommendation

Construction and Analysis of Cross-Modal General Feature Space Driven by Prior Information

Unsupervised Domain Adaptive Person Re-Identification Based on Progressive Hybrid Contrastive Learning

Class-Aware Contrastive Learning for Weakly Supervised Semantic Segmentation

Related Author

CHEN Rong-yuan

WEN Jie-bin

HUANG Shao-nian

HE Ye-yu

SUN Jing

SU Jian-bo

ZHAO Yu

SHU Qiao-yuan

Related Institution

School of Computer Science, Hunan University of Technology and Business

Key Laboratory of Hunan Province for Statistical Learning and Intelligent Computation, Hunan University of Technology and Business

College of Frontier Intersection, Hunan University of Technology and Business

School of Automation and Intelligent Sensing, Shanghai Jiao Tong University

School of Mathematics and Big Data, Chongqing University of Education

⁰