长春理工大学电子信息工程学院,吉林长春 130022
[ "李兴广 男,1976年8月出生于吉林省农安县.现为长春理工大学电子信息工程学院教授、副院长、博士生导师,吉林省拔尖创新人才、突贡专家.获吉林省科技进步二等奖两项.主要研究方向为多模态信息处理、微波毫米波技术、主动健康系统等." ]
[ "蔡禹健 男,1999年4月出生于吉林省长春市.现为长春理工大学电子信息工程学院博士研究生.主要研究方向为计算机视觉与雷达信号处理.E-mail: yujiantsai@mails.cust.edu.cn" ]
[ "崔炜 女,1978年6月出生于吉林省长春市.现为长春理工大学电子信息工程学院教授.主要研究方向为信号处理技术、室内定位技术、机器人感知技术等.E-mail: cuiwei@cust.edu.cn" ]
[ "李劲松 男,1998年3月出生于吉林省四平市. 现为长春理工大学电子信息工程学院博士研究生.主要研究方向雷达信号处理.E-mail: lijinsong@mails.cust.edu.cn" ]
[ "张莹瑀 女,1997年7月出生于内蒙古自治区兴安盟乌兰浩特市.现为长春理工大学电子信息工程学院博士研究生.主要研究方向为医学信号处理.E-mail: zhangyingyu@mails.cust.edu.cn" ]
收稿:2025-06-25,
录用:2025-11-04,
纸质出版:2025-11-25
移动端阅览
李兴广, 蔡禹健, 崔炜, 等. 混合量子与图神经网络的多模态情感分析方法[J]. 电子学报, 2025, 53(11): 3983-3995.
LI Xing-guang, CAI Yu-jian, CUI Wei, et al. A Hybrid Quantum-Graph Neural Network for Multimodal Sentiment Analysis[J]. Acta Electronica Sinica, 2025, 53(11): 3983-3995.
李兴广, 蔡禹健, 崔炜, 等. 混合量子与图神经网络的多模态情感分析方法[J]. 电子学报, 2025, 53(11): 3983-3995. DOI:10.12263/DZXB.20250554
LI Xing-guang, CAI Yu-jian, CUI Wei, et al. A Hybrid Quantum-Graph Neural Network for Multimodal Sentiment Analysis[J]. Acta Electronica Sinica, 2025, 53(11): 3983-3995. DOI:10.12263/DZXB.20250554
多模态情感分析(Multimodal Sentiment Analysis,MSA)是人工智能情感计算领域最具应用潜力的技术之一.视觉、语音与文本中包含了人类多数真实情感特征,融合三种模态获得更精细的情感多维度主观表达以保障情感分析结果准确依然面临诸多挑战.三种模态各自提取的情感特征子集中元素数量和时序不一致时,各模态选取代表性情感特征的良好策略是避免特殊情感特征被忽略或过度提取,以及保证后续融合分析时情感计算结果可信的关键.三种模态代表性情感特征直接融合分析时模态间情感信息的传递机制与互补机制未被充分利用,导致情感分析结果仅关联于某一模态代表语义特征,造成模型过拟合,分类输出结果错误.此外,人类的情感表达具有模态异构性与不一致性,常导致情感特征分布不均及模态极性歧义问题.算法模型不仅要捕获不同模态间的互补信息与细粒度关联,还要抑制冗余特征对情感判别的干扰,避免数据融合过程存在“语义鸿沟”,使结果稳定性受限.本文基于多尺度时序表征与量子比特多态表征思想,提出了混合量子与图神经网络的多模态情感分析方法.首先,构建代表性序列的拓扑表征图网络捕捉各特征节点之间的图结构动态关系,并在图网络中添加多头图注意力机制自适应调整节点与边权重,保证特殊情感特征可信选取.然后,设计情感特征量子计算网络,将多模态特征按量子编码映射至高维希尔伯特空间,基于量子叠加与纠缠机制进一步促进模态间特征的深层次耦合与相互依赖建模,通过量子测量过程将叠加态坍缩至特定的本征态,实现量子态与情感特征的对应映射,获得更具判别性的多模态融合情感表征.最终,将单模态与多模态预测作为多个子任务形成多任务协同优化机制,生成伪标签与共享表征提高每个任务的性能,结合多任务损失函数缓解模态表征不一致性,增强了模型的泛化性.在CMU-MOSI、CH-SIMS和CMU-MOSEI基准数据集上的系列实验结果表明,相较常用基线模型,方法情感二分类准确率提高了1.5%~8.7%、五分类准确率提高了3.3%~10.7%、七分类准确率提高了1.5%~14.5%、F1分数最高提升8.5、皮尔逊相关系数最高提升0.146和平均绝对误差最高下降0.304.
Multimodal sentiment analysis (MSA) is one of the most promising technologies in the field of affective computing. Visual
acoustic
and textual modalities encode most human emotional features. Integrating them yields a finer
multidimensional representation of subjective affect. However
achieving accurate and robust sentiment analysis still faces significant challenges. When the sentiment feature subsets extracted from each modality differ in element quantity or temporal alignment
an effective strategy for selecting representative emotional features is essential to prevent key features from being overlooked or over-extracted
thereby ensuring the reliability of subsequent fusion analysis. Direct fusion of representative features across modalities often fails to fully exploit information transmission and complementarity
which can cause excessive reliance on a single modality’s semantic representation and lead to overfitting or misclassification. Furthermore
human emotional expression exhibits modality heterogeneity and inconsistency
often resulting in uneven feature distributions and polarity ambiguity. Algorithmic models must not only capture cross-modal complementary information and fine-grained correlations but also suppress redundant features that interfere with emotional discrimination. The presence of a “semantic gap” in data fusion further limits result stability. To address these issues
this paper proposes a hybrid quantum-graph neural network
inspired by multi-scale temporal representation and qubit-based polymorphic encoding. First
a topological graph network of representative sequences is constructed to capture dynamic relationships among feature nodes
and a multi-head graph attention mechanism is introduced to adaptively adjust node and edge weights
ensuring reliable selection of critical sentiment features. Then
a quantum sentiment feature computation network is designed
mapping multimodal features into a high-dimensional Hilbert space via quantum encoding. Leveraging quantum superposition and entanglement
the model enhances deep intermodal coupling and dependency modeling. Through quantum measurement
superposed states collapse into specific eigenstates
establishing a correspondence between quantum states and sentiment features
and yielding more discriminative multimodal fusion representations. Finally
single-modal and multimodal predictions are formulated as multiple subtasks under a multitask collaborative optimization framework. Pseudo-label generation and shared representations improve task-specific performance
while a joint multitask loss mitigates inconsistencies among modality representations
enhancing the model’s generalization ability. Experimental results on the CMU-MOSI
CH-SIMS
and CMU-MOSEI benchmark datasets demonstrate that
compared with conventional baselines
the proposed method improves binary classification accuracy by 1.5%~8.7%
five-class accuracy by 3.3%~10.7%
and seven-class accuracy by 1.5%~14.5%. The F1 score increases by up to 8.5 points
the pearson correlation coefficient improves by up to 0.146
and the mean absolute error decreases by up to 0.304.
DAS R , SINGH T D . Multimodal sentiment analysis: A survey of methods, trends, and challenges [J ] . ACM Computing Surveys , 2023 , 55 ( 13 s): 1 - 38 .
LI W B , WU L , WANG C , et al . Intelligent cockpit for intelligent vehicle in metaverse: A case study of empathetic auditory regulation of human emotion [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2023 , 53 ( 4 ): 2173 - 2187 .
ZHAI G L , YANG Y , WANG H , et al . Multi-attention fusion modeling for sentiment analysis of educational big data [J ] . Big Data Mining and Analytics , 2020 , 3 ( 4 ): 311 - 319 .
ALAMOODI A H , ZAIDAN B B , ZAIDAN A A , et al . Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review [J ] . Expert Systems with Applications , 2021 , 167 : 114155 .
PANTIC M , ROTHKRANTZ L J M . Toward an affect-sensitive multimodal human-computer interaction [J ] . Proceedings of the IEEE , 2003 , 91 ( 9 ): 1370 - 1390 .
赵力 , 将春辉 , 邹采荣 , 等 . 语音信号中的情感特征分析和识别的研究 [J ] . 电子学报 , 2004 , 32 ( 4 ): 606 - 609 .
ZHAO L , JIANG C H , ZOU C R , et al . A study on emotional feature analysis and recognition in speech [J ] . Acta Electronica Sinica , 2004 , 32 ( 4 ): 606 - 609 . (in Chinese)
GANDHI A , ADHVARYU K , PORIA S , et al . Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J ] . Information Fusion , 2023 , 91 : 424 - 444 .
邵志文 , 周勇 , 谭鑫 , 等 . 基于深度学习的表情动作单元识别综述 [J ] . 电子学报 , 2022 , 50 ( 8 ): 2003 - 2017 .
SHAO Z W , ZHOU Y , TAN X , et al . Survey of expression action unit recognition based on deep learning [J ] . Acta Electronica Sinica , 2022 , 50 ( 8 ): 2003 - 2017 . (in Chinese)
ZHU T , LI L D , YANG J F , et al . Multimodal sentiment analysis with image-text interaction network [J ] . IEEE Transactions on Multimedia , 2023 , 25 : 3375 - 3385 .
CAI Y , HUANG Q B , LIN Z J , et al . Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: A multi-task learning approach [J ] . Knowledge-Based Systems , 2020 , 203 : 105856 .
ZADEH A , CHEN M H , PORIA S , et al . Tensor fusion network for multimodal sentiment analysis [EB/OL ] . ( 2017-07-23 )[ 2025-06-25 ] . https://arXiv.org/abs/1707.07250 https://arXiv.org/abs/1707.07250 .
WANG H B , REN C , YU Z T . Multimodal sentiment analysis based on multiple attention [J ] . Engineering Applications of Artificial Intelligence , 2025 , 140 : 109731 .
TSAI Y H , BAI S J , LIANG P P , et al . Multimodal transformer for unaligned multimodal language sequences [C ] // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : ACL , 2019 : 6558 - 6569 .
张换香 , 彭俊杰 . 基于方面级情感分析的深度语义挖掘模型 [J ] . 电子学报 , 2024 , 52 ( 7 ): 2307 - 2319 .
ZHANG H X , PENG J J . A deep semantic mining model based on aspect-level sentiment analysis [J ] . Acta Electronica Sinica , 2024 , 52 ( 7 ): 2307 - 2319 . (in Chinese)
ZHU L N , ZHU Z C , ZHANG C W , et al . Multimodal sentiment analysis based on fusion methods: A survey [J ] . Information Fusion , 2023 , 95 : 306 - 325 .
YU W M , XU H , MENG F Y , et al . CH-SIMS: A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality [C ] // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : ACL , 2020 : 3718 - 3727 .
HAZARIKA D , ZIMMERMANN R , PORIA S . MISA: Modality-invariant and-specific representations for multimodal sentiment analysis [C ] // Proceedings of the 28th ACM International Conference on Multimedia . New York : ACM , 2020 : 1122 - 1131 .
ZENG Y F , LI Z X , TANG Z J , et al . Heterogeneous graph convolution based on in-domain self-supervision for multimodal sentiment analysis [J ] . Expert Systems with Applications , 2023 , 213 : 119240 .
蒋昆 , 赵征鹏 , 普园媛 , 等 . 基于跨模态超图优化学习的多模态情感分析 [J ] . 计算机科学 , 2025 , 52 ( 7 ): 210 - 217 .
JIANG K , ZHAO Z P , PU Y Y , et al . Cross-modal hypergraph optimisation learning for multimodal sentiment analysis [J ] . Computer Science , 2025 , 52 ( 7 ): 210 - 217 . (in Chinese)
SUN H , NIU Z W , WANG H Y , et al . Multimodal sentiment analysis with mutual information-based disentangled representation learning [J ] . IEEE Transactions on Affective Computing , 2025 , 16 ( 3 ): 1606 - 1617 .
ZHANG Y H , ZHANG Y , GUO W Y , et al . Learning disentangled representation for multimodal cross-domain sentiment analysis [J ] . IEEE Transactions on Neural Networks and Learning Systems , 2023 , 34 ( 10 ): 7956 - 7966 .
BIAMONTE J , WITTEK P , PANCOTTI N , et al . Quantum machine learning [J ] . Nature , 2017 , 549 ( 7671 ): 195 - 202 .
HAVLÍČEK V , CÓRCOLES A D , TEMME K , et al . Supervised learning with quantum-enhanced feature spaces [J ] . Nature , 2019 , 567 ( 7747 ): 209 - 212 .
LI Y C , QU Y , ZHOU R G , et al . QMLSC: A quantum multimodal learning model for sentiment classification [J ] . Information Fusion , 2025 , 120 : 103049 .
PHUKAN A , PAL S , EKBAL A . Hybrid quantum-classical neural network for multimodal multitask sarcasm, emotion, and sentiment analysis [J ] . IEEE Transactions on Computational Social Systems , 2024 , 11 ( 5 ): 5740 - 5750 .
于瑞祺 , 张鑫云 , 任爽 . 基于变分量子电路的量子机器学习算法综述 [J ] . 计算机研究与发展 , 2025 , 62 ( 4 ): 821 - 851 .
YU R Q , ZHANG X Y , REN S . A review of quantum machine learning algorithms based on variational quantum circuit [J ] . Journal of Computer Research and Development , 2025 , 62 ( 4 ): 821 - 851 . (in Chinese)
YE Y , JI S H . Sparse graph attention networks [J ] . IEEE Transactions on Knowledge and Data Engineering , 2023 , 35 ( 1 ): 905 - 916 .
ZADEH A , ZELLERS R , PINCUS E , et al . MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos [EB/OL ] . ( 2016-08-12 )[ 2025-06-25 ] . https://arXiv.org/abs/1606.06259 https://arXiv.org/abs/1606.06259 .
ZADEH A B , LIANG P P , PORIA S , et al . Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph [C ] // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics . Stroudsburg : ACL , 2018 : 2236 - 2246 .
ROSENBERG E L , EKMAN P . What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System (FACS) [M ] . Oxford : Oxford University Press , 1997 .
BALTRUSAITIS T , ZADEH A , LIM Y C , et al . OpenFace 2.0: Facial behavior analysis toolkit [C ] // 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) . New York : ACM , 2018 : 59 - 66 .
DEVLIN J , CHANG M W , LEE K , et al . BERT: Pre-training of deep bidirectional transformers for language understanding [C ] // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Stroudsburg : ACL , 2019 : 4171 - 4186 .
DEGOTTEX G , KANE J , DRUGMAN T , et al . COVAREP: A collaborative voice analysis repository for speech technologies [C ] // 2014 IEEE International Conference on Acoustics, Speech and Signal Processing . Piscataway : IEEE , 2014 : 960 - 964 .
MCFEE B , RAFFEL C , LIANG D , et al . Librosa: Audio and music signal analysis in python [J ] . SciPy , 2015 , 2015 : 18 - 24 .
LIU Z , SHEN Y , LAKSHMINARASIMHAN V B , et al . Efficient low-rank multimodal fusion with modality-specific factors [EB/OL ] . ( 2018-05-31 )[ 2025-06-25 ] . https://arXiv.org/abs/1806.00064 https://arXiv.org/abs/1806.00064 .
ZADEH A , LIANG P P , MAZUMDER N , et al . Memory fusion network for multi-view sequential learning [J ] . Proceedings of the 32nd AAAI Conference on Artificial Intelligence , 2018 , 32 ( 1 ): 59 - 66 .
YU W M , XU H , YUAN Z Q , et al . Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2021 , 35 ( 12 ): 10790 - 10797 .
HAN W , CHEN H , PORIA S . Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis [EB/OL ] . ( 2021-09-16 )[ 2025-06-25 ] . https://arXiv.org/abs/2109.00412 https://arXiv.org/abs/2109.00412 .
0
浏览量
7
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621