Fake News Detection via Multi-Channel Feature Enhancement and Visual-Textual Similarity Awareness

ZHANG Shi-bin; CAI Song-rui; YANG Min; CHEN Shi-hang

doi:10.12263/DZXB.20250650

您当前的位置：

首页 >

文章列表页 >

Fake News Detection via Multi-Channel Feature Enhancement and Visual-Textual Similarity Awareness

PAPERS | 更新时间：2026-04-24

- Fake News Detection via Multi-Channel Feature Enhancement and Visual-Textual Similarity Awareness
- ACTA ELECTRONICA SINICA Vol. 53, Issue 12, Pages: 4614-4629(2025)
- 作者机构：
  
  1.成都信息工程大学人工智能学院（芯谷产业学院），四川成都 610225
  2.成都信息工程大学网络空间安全学院（芯谷产业学院），四川成都 610225
  3.先进密码技术与系统安全四川省重点实验室，四川成都 610225
  4.先进微处理器技术国家工程研究中心（工业控制与安全分中心），四川成都 610225
- 作者简介：
- 基金信息：
  
  National Key Research and Development Program of China(2022YFB3103103);Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(SKLACSS-202404);Chengdu Key Research and Development Project(2023-XT00-00002-GX);Chengdu Key Research and Development Support Program Project(2024-YF05-01227-SN)
- DOI：10.12263/DZXB.20250650
  CLC： TP391.1;
- Received：24 July 2025，
  
  Accepted：17 December 2025，
  
  Published：25 December 2025
- 稿件说明：
移动端阅览
张仕斌, 蔡松睿, 杨敏, 等. 基于多通道特征增强与图文相似度感知的虚假新闻检测[J]. 电子学报, 2025, 53(12): 4614-4629.

ZHANG Shi-bin, CAI Song-rui, YANG Min, et al. Fake News Detection via Multi-Channel Feature Enhancement and Visual-Textual Similarity Awareness[J]. Acta Electronica Sinica, 2025, 53(12): 4614-4629.
张仕斌, 蔡松睿, 杨敏, 等. 基于多通道特征增强与图文相似度感知的虚假新闻检测[J]. 电子学报, 2025, 53(12): 4614-4629. DOI：10.12263/DZXB.20250650

ZHANG Shi-bin, CAI Song-rui, YANG Min, et al. Fake News Detection via Multi-Channel Feature Enhancement and Visual-Textual Similarity Awareness[J]. Acta Electronica Sinica, 2025, 53(12): 4614-4629. DOI：10.12263/DZXB.20250650

摘要

人工智能（Artificial Intelligence，AI）技术的快速发展在丰富互联网内容生态的同时，也加剧了多模态虚假新闻的广泛传播，特别是深度伪造技术的应用使得虚假信息在视觉与语义层面呈现出较高的逼真性，严重威胁网络公共空间的信任体系.尽管现有的多模态虚假新闻检测技术已利用跨模态注意力机制及大语言模型（Large Language Models，LLMs）实现了多模态语义对齐与推理增强，但这些方法在特定场景下仍面临挑战.一方面，通用大模型存在“幻觉”风险，且多局限于粗粒度的语义融合，难以精准捕捉图文实体间的不匹配冲突；另一方面，现有模型往往忽略了对图像频域物理伪影及文本情感操纵信号的挖掘，导致其在面对生成式AI高保真的伪造内容时鉴别力受限.针对上述问题，本文提出了一种基于多通道特征增强与图文相似度感知的图注意力网络（Multimodal Similarity-aware Graph Attention Network，MS-GAT）.该方法首先设计了多通道特征提取模块，其利用双向编码器表征（Bidirectional Encoder Representations from Transformers，BERT）模型提取文本的深层语义与情感特征，并结合视觉Transformer（Vision Transformer，ViT）获取图像空间特征，同时引入快速傅里叶变换（Fast Fourier Transform，FFT）捕捉图像频域中的异常伪影，并通过自适应门控单元实现多通道特征的加权融合.在此基础上，本文构建了一个包含图文实体节点与模态枢纽节点的相似度感知异构图，利用对比语言-图像预训练（Contrastive Language-Image Pre-training，CLIP）模型计算各节点在共享语义空间中的相似度，并以此显式地建模图文间的细粒度关联.最后，模型利用图注意力网络（Graph Attention Network，GAT）聚合邻域信息，通过注意力权重动态调整节点间的关联强度以聚焦图文不一致特征，并配合自适应的多任务损失函数解决联合学习中的优化不平衡问题.所提方法在Weibo17和CFND数据集上的准确率分别达到94.5%和87.6%，各项关键性能指标均优于现有主流基线.研究结果表明，本方法通过融合图文多通道特征与结构化推理机制，实现了对图文深层语义冲突的捕捉，为提升多模态虚假新闻检测的可解释性与鲁棒性提供了新的视角与技术支撑.

Abstract

The rapid development of artificial intelligence (AI) technology has enriched the Internet content ecosystem while simultaneously exacerbating the widespread propagation of multimodal fake news. In particular

the application of deepfake technology renders false information highly realistic at both visual and semantic levels

posing a severe threat to the trust system of the online public sphere. Although existing multimodal fake news detection techniques have utilized cross-modal attention mechanisms and large language models (LLMs) to achieve multimodal semantic alignment and reasoning enhancement

these methods still face challenges in specific scenarios. On one hand

general-purpose large models are prone to “hallucination” risks and are often limited to coarse-grained semantic fusion

making it difficult to accurately capture mismatch conflicts between visual and textual entities. On the other hand

existing models often overlook the mining of physical artifacts in the image frequency domain and emotional manipulation signals in the text

resulting in limited discrimination capability when facing high-fidelity fake content generated by generative AI. To address the aforementioned issues

this paper proposes a multimodal similarity-aware graph attention network (MS-GAT) based on multi-channel feature enhancement. The method first designs a multi-channel feature extraction module

utilizing the bidirectional encoder representations from transformers (BERT) model to extract deep semantic and emotional features of the text

combined with the vision transformer (ViT) to acquire image spatial features. Simultaneously

it introduces the fast Fourier transform (FFT) to capture anomalous artifacts in the image frequency domain and implements weighted fusion of multi-channel features through an adaptive gating unit. Building upon this

this paper constructs a similarity-aware heterogeneous graph containing visual-textual entity nodes and modality hub nodes. It utilizes the CLIP model to calculate the similarity of each node in a shared semantic space and thereby explicitly models the fine-grained associations between images and text. Finally

the model employs the graph attention network (GAT) to aggregate neighborhood information

dynamically adjusting the association strength between nodes via attention weights to focus on visual-textual inconsistency features

and incorporates an adaptive multi-task loss function to resolve the optimization imbalance problem in joint learning. The proposed method achieves accuracies of 94.5% and 87.6% on the Weibo17 and CFND datasets

respectively

with all key performance indicators outperforming existing mainstream baselines. Research results indicate that by integrating multi-channel visual-textual features with structured reasoning mechanisms

the proposed method successfully captures deep semantic conflicts between images and text

providing a new perspective and technical support for enhancing the interpretability and robustness of multimodal fake news detection.

关键词

Keywords

references

LAZER D M J , BAUM M A , BENKLER Y , et al . The science of fake news [J ] . Science , 2018 , 359 ( 6380 ): 1094 - 1096 .

HU L M , YANG T C , ZHANG L H , et al . Compare to the knowledge: Graph neural fake news detection with external knowledge [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . Kerrville : Association for Computational Linguistics , 2021 : 754 - 763 .

GUO B , DING Y S , YAO L N , et al . The future of false information detection on social media: New perspectives and trends [J ] . ACM Computing Surveys (CSUR) , 2021 , 53 ( 4 ): 1 - 36 .

LUO D L , LIU Y L , YANG R , et al . Toward real text manipulation detection: New dataset and new solution [J ] . Pattern Recognition , 2025 , 157 : 110828 .

MRIDHA M F , KEYA A J , HAMID M A , et al . A comprehensive review on fake news detection with deep learning [J ] . IEEE Access , 2021 , 9 : 156151 - 156170 .

ESSA E , OMAR K , ALQAHTANI A . Fake news detection based on a hybrid BERT and LightGBM models [J ] . Complex & Intelligent Systems , 2023 , 9 ( 6 ): 6581 - 6592 .

BALSHETWAR S V , RS A , R D J . Fake news detection in social media based on sentiment analysis using classifier techniques [J ] . Multimedia Tools and Applications , 2023 , 82 ( 23 ): 35781 - 35811 .

STEINEBACH M , LIU H J , GOTKOWSKI K . Fake news detection by image montage recognition [J ] . Journal of Cyber Security and Mobility , 2020 , 9 ( 2 ): 175 - 202 .

ZHOU P , HAN X T , MORARIU V I , et al . Learning rich features for image manipulation detection [C ] // 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2018 : 1053 - 1061 .

ZHANG T , WANG D , CHEN H H , et al . BDANN: BERT-based domain adaptation neural network for multi-modal fake news detection [C ] // 2020 International Joint Conference on Neural Networks . Piscataway : IEEE , 2020 : 9206973 .

WANG L Z , ZHANG C , XU H B , et al . Cross-modal contrastive learning for multimodal fake news detection [C ] // Proceedings of the 31st ACM International Conference on Multimedia . New York : ACM , 2023 : 5696 - 5704 .

FU L F , PENG H X , MA C J , et al . Fake news detection based on text-modal dominance and fusing multiple multi-model clues [J ] . Computers , Materials & Continua, 2024 , 78 ( 3 ): 4399 - 4416 .

WU L W , LIU P S , ZHAO Y Q , et al . Human cognition-based consistency inference networks for multi-modal fake news detection [J ] . IEEE Transactions on Knowledge and Data Engineering , 2024 , 36 ( 1 ): 211 - 225 .

DUC TUAN N M , QUANG NHAT MINH P . Multimodal fusion with BERT and attention mechanism for fake news detection [C ] // 2021 RIVF International Conference on Computing and Communication Technologies . Piscataway : IEEE , 2021 : 9642125 .

WU Y , ZHAN P W , ZHANG Y J , et al . Multimodal fusion with co-attention networks for fake news detection [C ] // Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 . Stroudsburg : ACL , 2021 : 2560 - 2569 .

CUI L M , WANG S H , LEE D . SAME: Sentiment-aware multi-modal embedding for detecting fake news [C ] // 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining . Piscataway : IEEE , 2020 : 41 - 48 .

LAO A , ZHANG Q , SHI C Y , et al . Frequency spectrum is more effective for multimodal representation and fusion: A multimodal spectrum rumor detector [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2024 , 38 ( 16 ): 18426 - 18434 .

JIN Z W , CAO J , GUO H , et al . Multimodal fusion with recurrent neural networks for rumor detection on microblogs [C ] // Proceedings of the 25th ACM International Conference on Multimedia . New York : ACM , 2017 : 795 - 816 .

ZHANG Q , LIU J , ZHANG F , et al . Natural language-centered inference network for multi-modal fake news detection [C ] // Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , IJCAI-24 . California : IJCAI , 2024 : 2542 - 2550 .

ZHANG X C , GHORBANI A A . An overview of online fake news: Characterization, detection, and discussion [J ] . Information Processing & Management , 2020 , 57 ( 2 ): 102025 .

ASHISH , SONIA , ARORA M , et al . An analysis and identification of fake news using machine learning techniques [C ] // 2024 11th International Conference on Computing for Sustainable Global Development . Piscataway : IEEE , 2024 : 634 - 638 .

TIAN Z Y , BASKIYAR S . Fake news detection using machine learning with feature selection [C ] // 2021 6th International Conference on Computing, Communication and Security . Piscataway : IEEE , 2021 : 9776346 .

蒋凌云 , 鞠金恒 , 徐佳 , 等 . 一种基于改进CRNN的轻量化乐谱识别方法 [J ] . 电子学报 , 2023 , 51 ( 11 ): 3167 - 3175 .

JIANG L Y , JU J H , XU J , et al . A lightweight music recognition method based on improved CRNN [J ] . Acta Electronica Sinica , 2023 , 51 ( 11 ): 3167 - 3175 . (in Chinese)

苏兆品 , 张羚 , 张国富 , 等 . 基于多特征融合和BiLSTM的语音隐写检测算法 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1300 - 1309 .

SU Z P , ZHANG L , ZHANG G F , et al . A speech steganalysis algorithm based on multi-feature fusion and BiLSTM [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1300 - 1309 . (in Chinese)

BAHAD P , SAXENA P , KAMAL R . Fake news detection using bi-directional LSTM-recurrent neural network [J ] . Procedia Computer Science , 2019 , 165 : 74 - 82 .

CHANG Q , LI X , DUAN Z . Graph global attention network with memory: A deep learning approach for fake news detection [J ] . Neural Networks , 2024 , 172 : 106115 .

HUANG Y , LIN J Y , ZHOU C , et al . Modality competition: What makes joint training of multi-modal network fail in deep learning? (provably)[EB/OL ] . ( 2022-03-23 )[ 2025-10-10 ] . https://arXiv.org/abs/2203.12221 https://arXiv.org/abs/2203.12221 .

VARALAKSHMI K , ASHOK KUMAR P M . A late fusion framework using whale optimization technique and attention-BiLSTM for fake news detection [J ] . International Journal of Data Science and Analytics , 2024 , 18 ( 3 ): 275 - 294 .

NASIR S , WASIM M , REHMAN A , et al . FACT-CLIP: Fake news detection via CLIP-based cross-modal attention and transformer fusion [C ] // 2025 International Conference on Emerging Technologies in Electronics, Computing, and Communication . Piscataway : IEEE , 2025 : 11070224 .

LIU A X , FENG B , XUE B , et al . DeepSeek-V3 technical report [EB/OL ] . ( 2025-02-18 )[ 2025-10-10 ] . https://arXiv.org/abs/2412.19437 https://arXiv.org/abs/2412.19437 .

YANG A , LI A F , YANG B S , et al . Qwen3 technical report [EB/OL ] . ( 2025-05-14 )[ 2025-10-10 ] . https://arXiv.org/abs/2505.09388 https://arXiv.org/abs/2505.09388 .

XU P , SHAO W Q , ZHANG K P , et al . LVLM-EHub: A comprehensive evaluation benchmark for large vision-language models [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2025 , 47 ( 3 ): 1877 - 1893 .

CHOI Y , UH Y , YOO J , et al . StarGAN v2: Diverse image synthesis for multiple domains [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway : IEEE , 2020 : 8185 - 8194 .

TRAN N T , TRAN V H , NGUYEN N B , et al . Self-supervised GAN: Analysis and improvement with multi-class minimax game [EB/OL ] . ( 2020-01-08 )[ 2025-10-10 ] . https://arXiv.org/abs/1911.06997 https://arXiv.org/abs/1911.06997 .

FRANK J , EISENHOFER T , SCHÖNHERR L , et al . Leveraging frequency analysis for deep fake image recognition [C ] // Proceedings of the 37th International Conference on Machine Learning . New York : ACM , 2020 : 3247 - 3258 .

JING J , WU H C , SUN J , et al . Multimodal fake news detection via progressive fusion networks [J ] . Information Processing & Management , 2023 , 60 ( 1 ): 103120 .

VOSOUGHI S , ROY D , ARAL S . The spread of true and false news online [J ] . Science , 2018 , 359 ( 6380 ): 1146 - 1151 .

PASCHEN J . Investigating the emotional appeal of fake news using artificial intelligence and human contributions [J ] . Journal of Product & Brand Management , 2019 , 29 ( 2 ): 223 - 233 .

AJAO O , BHOWMIK D , ZARGARI S . Sentiment aware fake news detection on online social networks [C ] // ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2019 : 2507 - 2511 .

ZHANG X Y , CAO J , LI X R , et al . Mining dual emotion for fake news detection [C ] // Proceedings of the Web Conference 2021 . New York : ACM , 2021 : 3465 - 3476 .

GHANEM B , PONZETTO S P , ROSSO P , et al . FakeFlow: Fake news detection by modeling the flow of affective information [EB/OL ] . ( 2021-01-24 )[ 2025-10-10 ] . https://arXiv.org/abs/2101.09810 https://arXiv.org/abs/2101.09810 .

WAN M Y , ZHONG Y , GAO X F , et al . Fake news, real emotions: Emotion analysis of COVID-19 infodemic in weibo [J ] . IEEE Transactions on Affective Computing , 2024 , 15 ( 3 ): 815 - 827 .

GIACHANOU A , ROSSO P , CRESTANI F . The impact of emotional signals on credibility assessment [J ] . Journal of the Association for Information Science and Technology , 2021 , 72 ( 9 ): 1117 - 1132 .

LI P G , SUN X , YU H F , et al . Entity-oriented multi-modal alignment and fusion network for fake news detection [J ] . IEEE Transactions on Multimedia , 2022 , 24 : 3455 - 3468 .

FU L F , PENG H X , LIU S . KG-MFEND: An efficient knowledge graph-based model for multi-domain fake news detection [J ] . The Journal of Supercomputing , 2023 , 79 ( 16 ): 18417 - 18444 .

MA Z H , LUO M N , GUO H , et al . Event-radar: Event-driven multi-view learning for multimodal fake news detection [C ] // Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics . Stroudsburg : ACL , 2024 : 5809 - 5821 .

QIAN S S , HU J , FANG Q , et al . Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection [J ] . ACM Transactions on Multimedia Computing, Communications, and Applications , 2021 , 17 ( 3 ): 1 - 23 .

QI P , CAO J , LI X R , et al . Improving fake news detection by using an entity-enhanced framework to fuse diverse multimodal clues [C ] // Proceedings of the 29th ACM International Conference on Multimedia . New York : ACM , 2021 : 1212 - 1220 .

SCARSELLI F , GORI M , TSOI A C , et al . The graph neural network model [J ] . IEEE Transactions on Neural Networks , 2009 , 20 ( 1 ): 61 - 80 .

WU Z H , PAN S R , CHEN F W , et al . A comprehensive survey on graph neural networks [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2021 , 32 ( 1 ): 4 - 24 .

RADFORD A , KIM J W , HALLACY C , et al . Learning transferable visual models from natural language supervision [EB/OL ] . ( 2021-02-26 )[ 2025-10-10 ] . https://arXiv.org/abs/2103.00020 https://arXiv.org/abs/2103.00020 .

DEVLIN J , CHANG M W , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . Kerrville : Association for Computational Linguistics , 2019 : 4171 - 4186 .

DOSOVITSKIY A , BEYER L , KOLESNIKOV A , et al . An image is worth 16 x 16 words: Transformers for image recognition at scale[EB/OL ] . ( 2021-06-03 )[ 2025-10-10 ] . https://arXiv.org/abs/2010.11929 https://arXiv.org/abs/2010.11929 .

VELIČKOVIĆ P , CUCURULL G , CASANOVA A , et al . Graph attention networks [EB/OL ] . ( 2018-02-04 )[ 2025-10-10 ] . https://arXiv.org/abs/1710.10903 https://arXiv.org/abs/1710.10903 .

ZOU H Q , SHEN M , CHEN C , et al . UniS-MMC: Multimodal classification via unimodality-supervised multimodal contrastive learning [EB/OL ] . ( 2023-05-16 )[ 2025-10-10 ] . https://arXiv.org/abs/2305.09299 https://arXiv.org/abs/2305.09299 .

ZHANG T L , YU E , SHAO Y , et al . Multimodal inverse attention network with intrinsic discriminant feature exploitation for fake news detection [EB/OL ] . ( 2025-05-29 )[ 2025-10-10 ] . https://arXiv.org/abs/2502.01699 https://arXiv.org/abs/2502.01699 .

LI G Y , HU D , FU X M , et al . Entity graph alignment and visual reasoning for multimodal fake news detection [C ] // Proceedings of the 33rd ACM International Conference on Multimedia . New York : ACM , 2025 : 2486 - 2495 .

AREVALO J , SOLORIO T , MONTES-Y-GÓMEZ M , et al . Gated multimodal units for information fusion [EB/OL ] . ( 2017-02-07 )[ 2025-10-10 ] . https://arXiv.org/abs/1702.01992 https://arXiv.org/abs/1702.01992 .

MAATEN L , HINTON G . Visualizing data using t-SNE [J ] . Journal of Machine Learning Research , 2008 , 9 : 2579 - 2605 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Sequence-Based and Cross-Modal Alignment Model for Protein Function Prediction

NAHGNN: Neighborhood Aware Heterogeneous Graph Neural Network

Enhancing Multimodal Aspect-Based Sentiment Analysis with Adaptive Noise and Aspect Graph Association Learning

Image Enhancement via Content Semantic-Aware Multimodal Fusion

Related Author

XU Min

HU Chun-ling

HU Ting

ZHANG Fang-fang

DAI Xiang-long

LI Qiang

ZHENG Wei

CHEN Ming

Related Institution

School of Artificial Intelligence and Big Data, Hefei University

College of Information Science and Engineering, Hunan Normal University

Key Laboratory of Intelligent Sensing System and Security, Ministry of Education

School of Cyber Science and Technology, Hubei University

Hubei Province Project of Key Research Institute of Humanities and Social Sciences at Universities - RCIMPE

⁰