基于因果干预的无偏面部动作单元识别

邵志文; 陈必宽; 祝汉城; 周勇; 姚睿; 马利庄

doi:10.12263/DZXB.20240279

您当前的位置：

首页 >

文章列表页 >

基于因果干预的无偏面部动作单元识别

学术论文 | 更新时间：2026-05-07

- 基于因果干预的无偏面部动作单元识别
- Causal Intervention for Unbiased Facial Action Unit Recognition
- 电子学报 2024年52卷第10期页码：3312-3321
- 作者机构：
  
  1.中国矿业大学计算机科学与技术学院，江苏徐州 221116
  2.矿山数字化教育部工程研究中心，江苏徐州 221116
  3.上海交通大学计算机科学与工程系，上海 200240
  4.华东师范大学计算机科学与技术学院，上海 200062
- 作者简介：
  
  [ "邵志文男， 1994年12月出生于安徽省马鞍山市.现为中国矿业大学计算机科学与技术学院副教授、硕士生导师.主要研究方向为情感计算、计算机视觉和人工智能.E-mail: zhiwen_shao@cumt.edu.cn" ]
  [ "陈必宽男， 2003年5月出生于广东省湛江市.现为中国矿业大学计算机科学与技术学院本科生.主要研究方向为计算机视觉.E-mail: bikuan_chen@cumt.edu.cn" ]
  [ "祝汉城男， 1989年12月出生于江苏省徐州市.现为中国矿业大学计算机科学与技术学院副教授、硕士生导师.主要研究方向为情感计算、计算机视觉和人工智能.E-mail: zhuhancheng@cumt.edu.cn" ]
  [ "周勇男， 1974年9月出生于江苏省徐州市.现为中国矿业大学计算机科学与技术学院教授、博士生导师.主要研究方向为计算机视觉、机器学习和人工智能.E-mail: yzhou@cumt.edu.cn" ]
  [ "姚睿男， 1982年7月出生于河南省南阳市.现为中国矿业大学计算机科学与技术学院教授、博士生导师.主要研究方向为计算机视觉和人工智能.E-mail: ruiyao@cumt.edu.cn" ]
  [ "马利庄男， 1963年2月出生于浙江省宁波市.现为上海交通大学计算机科学与工程系教授、博士生导师.主要研究方向为计算机视觉、计算机图形学和人工智能.E-mail: ma-lz@cs.sjtu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62106268;62101555;62272461;62172417;72192821);江苏省自然科学基金(BK20210488;BK20201346);上海市“科技创新行动计划”(21511101200);中国博士后科学基金(2023M732223);“香江学者计划”(XJ2023037)
- DOI：10.12263/DZXB.20240279
  中图分类号： TP391.4
- 收稿：2024-03-28，
  
  修回：2024-07-13，
  
  纸质出版：2024-10-25
- 稿件说明：
移动端阅览
邵志文, 陈必宽, 祝汉城, 等. 基于因果干预的无偏面部动作单元识别[J]. 电子学报, 2024, 52(10): 3312-3321.

SHAO Zhi-wen, CHEN Bi-kuan, ZHU Han-cheng, et al. Causal Intervention for Unbiased Facial Action Unit Recognition[J]. Acta Electronica Sinica, 2024, 52(10): 3312-3321.
邵志文, 陈必宽, 祝汉城, 等. 基于因果干预的无偏面部动作单元识别[J]. 电子学报, 2024, 52(10): 3312-3321. DOI：10.12263/DZXB.20240279

SHAO Zhi-wen, CHEN Bi-kuan, ZHU Han-cheng, et al. Causal Intervention for Unbiased Facial Action Unit Recognition[J]. Acta Electronica Sinica, 2024, 52(10): 3312-3321. DOI：10.12263/DZXB.20240279

摘要

面部动作单元（Action Unit，AU）识别是计算机视觉与情感计算领域的热点课题.AU识别属于多标签二分类任务，目前面临着标签不均衡等挑战.现有的主流算法利用AU之间的关联，通过调整采样率和AU的权重来进行标签重均衡化.然而，这些方法仅仅使模型预测时从偏向出现频率高的标签转为偏向出现频率低的标签，并未解决偏置问题.根据出现频率的高低可将AU划分为头类和尾类，公平对待每一类是实现AU无偏识别的关键.本文引入因果推理理论，提出基于因果干预的无偏化方法（Causal Intervention for Unbiased facial action unit recognition，CIU），以解决多AU间不均衡的问题.通过调整不平衡域和平衡但不可见域上的经验风险实现模型的无偏性.大量实验结果表明，本方法在基准数据集BP4D、DISFA上超越已有的方法，其中在DISFA上超越当前最先进方法1.1%，且可以学习到无偏的特征表示.

Abstract

Facial action unit (AU) recognition is a hot topic in the fields of computer vision and affective computing. AU recognition is a multi-label binary classification task

and currently faces challenges such as label imbalance. Most existing methods re-balance labels by adjusting the sampling rate and weights of AUs based on the correlations among AUs. However

these methods only shift the model’s prediction bias from high-frequency labels to low-frequency ones

and the bias is still unresolved. Fair treatment of each AU class

including the head and tail classes

is the key to achieve unbiased AU recognition. By introducing causal inference theory

we propose an unbiased AU recognition method CIU (Causal Intervention for Unbiased facial action unit recognition)

which adjusts the empirical risks in both the imbalanced and balanced but invisible domains to achieve model unbiasedness. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on BP4D and DISFA benchmarks

in which 1.1% margin over previous best method is achieved on DISFA

and can learn unbiased feature representation.

关键词

Keywords

references

张晶 , 王翌歆 , 任永功 . 统一全局空间表达的脑电信号跨被试情感识别 [J ] . 电子学报 , 2023 , 51 ( 5 ): 1396 - 1404 .

ZHANG J , WANG Y X , REN Y G . Unified global spatial representation for EEG subject-independent emotion recognition [J ] . Acta Electronica Sinica , 2023 , 51 ( 5 ): 1396 - 1404 . (in Chinese)

FRANK M G , EKMAN P . The ability to detect deceit generalizes across different types of high-stake lies [J ] . Journal of Personality and Social Psychology , 1997 , 72 ( 6 ): 1429 - 1439 .

EKMAN P , FRIESEN W V . Facial Action Coding System: A Technique for the Measurement of Facial Movement [M ] . Palo Alto : Consulting Psychologists Press , 1978 .

SHAO Z W , ZHOU Y , CAI J F , et al . Facial action unit detection via adaptive attention and relation [J ] . IEEE Transactions on Image Processing (TIP) , 2023 , 32 : 3354 - 3366 .

邵志文 , 周勇 , 谭鑫 , 等 . 基于深度学习的表情动作单元识别综述 [J ] . 电子学报 , 2022 , 50 ( 8 ): 2003 - 2017 .

SHAO Z W , ZHOU Y , TAN X , et al . Survey of expression action unit recognition based on deep learning [J ] . Acta Electronica Sinica , 2022 , 50 ( 8 ): 2003 - 2017 . (in Chinese)

LI W , ABTAHI F , ZHU Z G , et al . EAC-net: Deep nets with enhancing and cropping for facial action unit detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) , 2018 , 40 ( 11 ): 2583 - 2596 .

LI Y Q , WU B Y , ZHAO Y P , et al . Handling missing labels and class imbalance challenges simultaneously for facial action unit recognition [J ] . Multimedia Tools and Applications , 2019 , 78 ( 14 ): 20309 - 20332 .

WANG S J , LIN B , WANG Y , et al . Action units recognition based on deep spatial-convolutional and multi-label residual network [J ] . Neurocomputing , 2019 , 359 : 130 - 138 .

SHAO Z W , LIU Z L , CAI J F , et al . Facial action unit detection using attention and relation learning [J ] . IEEE Transactions on Affective Computing (TAFFC) , 2022 , 13 ( 3 ): 1274 - 1289 .

TALLEC G , DAPOGNY A , BAILLY K . Fighting noise and imbalance in action unit detection problems [EB/OL ] . ( 2023-03-06 )[ 2024-07-13 ] . https://arxiv.org/abs/2303.02994 https://arxiv.org/abs/2303.02994 .

SHAO Z W , LIU Z L , CAI J F , et al . JÂA-net: Joint facial action unit detection and face alignment via adaptive attention [J ] . International Journal of Computer Vision (IJCV) , 2021 , 129 ( 2 ): 321 - 340 .

SONG T F , CHEN L S , ZHENG W M , et al . Uncertain graph neural networks for facial action unit detection [C ] // Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) . Washington : AAAI , 2021 : 5993 - 6001 .

ZHU B E , NIU Y L , HUA X S , et al . Cross-domain empirical risk minimization for unbiased long-tailed classification [C ] // Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) . Washington : AAAI , 2022 : 3589 - 3597 .

刘兵 , 李穗 , 刘明明 , 等 . 基于全局与序列混合变分Transformer的多样化图像描述生成方法 [J ] . 电子学报 , 2024 , 52 ( 4 ): 1305 - 1314 .

LIU B , LI S , LIU M M , et al . Diverse image captioning based on hybrid global and sequential variational transformer [J ] . Acta Electronica Sinica , 2024 , 52 ( 4 ): 1305 - 1314 . (in Chinese)

卓亚琦 , 魏家辉 , 李志欣 . 基于双注意模型的图像描述生成方法研究 [J ] . 电子学报 , 2022 , 50 ( 5 ): 1123 - 1130 .

ZHUO Y Q , WEI J H , LI Z X . Research on image captioning based on double attention model [J ] . Acta Electronica Sinica , 2022 , 50 ( 5 ): 1123 - 1130 . (in Chinese)

WANG T , HUANG J Q , ZHANG H W , et al . Visual commonsense R-CNN [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 10760 - 10770 .

YANG X , ZHANG H W , QI G J , et al . Causal attention for vision-language tasks [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 9847 - 9857 .

JUNG Y H , TIAN J , BAREINBOIM E . Learning causal effects via weighted empirical risk minimization [C ] // Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) . Cambridge : MIT Press , 2020 : 12697 - 12709 .

PEARL J . Causality: Models, Reasoning, and Inference [M ] . 2nd ed . Cambridge : Cambridge University Press , 2009 .

PEARL J , GLYMOUR M , JEWELL N P . Causal Inference in Statistics: A Primer [M ] . Reprinted ed. with revisions. Chichester : Wiley , 2021 .

TANG K H , HUANG J Q , ZHANG H W . Long-tailed classification by keeping the good and removing the bad momentum causal effect [C ] // Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) . Cambridge : MIT Press , 2020 : 1513 - 1524 .

LIN M , Chen Q , Yan S C . Network in network [C ] // Proceedings of the International Conference on Learning Representations (ICLR) . Alameda : OpenReview , 2014 : 1 - 10 .

ZHANG X , YIN L J , COHN J F , et al . BP4D-Spontaneous: A high-resolution spontaneous 3D dynamic facial expression database [J ] . Image and Vision Computing (IVC) , 2014 , 32 ( 10 ): 692 - 706 .

MAVADATI S M , MAHOOR M H , BARTLETT K , et al . DISFA: A spontaneous facial action intensity database‍ [J ] . IEEE Transactions on Affective Computing (TAFFC) , 2013 , 4 ( 2 ): 151 - 160 .

PASZKE A , GROSS S , MASSA F , et al . Pytorch: An imperative style, high-performance deep learning library [C ] // Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) . Cambridge : MIT Press , 2019 : 8024 - 8035 .

SUTSKEVER I , MARTENS J , DAHL G , et al . On the importance of initialization and momentum in deep learning [C ] // Proceedings of the International Conference on Machine Learning (ICML) . Stroudsburg : IMLS , 2013 : 1139 - 1147 .

CORNEANU C , MADADI M , ESCALERA S . Deep structure inference network for facial action unit recognition [C ] // Proceedings of the European Conference on Computer Vision (ECCV) . Berlin : Springer , 2018 : 309 - 324 .

李冠彬 , 张锐斐 , 朱鑫 , 等 . 语义关系引导的面部动作单元分析 [J ] . 软件学报 , 2023 , 34 ( 6 ): 2922 - 2941 .

LI G B , ZHANG R F , ZHU X , et al . Semantic relationships guided facial action unit analysis [J ] . Journal of Software , 2023 , 34 ( 6 ): 2922 - 2941 . (in Chinese)

SANKARAN N , MOHAN D D , SETLUR S , et al . Representation learning through cross-modality supervision [C ] // Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG) . Piscataway : IEEE , 2019 : 1 - 8 .

NIU X S , HAN H , YANG S F , et al . Local relationship learning with person-specific shape regularization for facial action unit detection [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 11917 - 11926 .

SHAO Z W , ZHOU Y , LIU B , et al . Facial action unit detection via hybrid relational reasoning [J ] . The Visual Computer (TVC) , 2022 , 38 ( 9 ): 3045 - 3057 .

CHEN Y D , SONG G X , SHAO Z W , et al . GeoConv: Geodesic guided convolution for facial action unit recognition [J ] . Pattern Recognition (PR) , 2022 , 122 : 108355 .

MA C , CHEN L , YONG J H . AU R-CNN: Encoding expert prior knowledge into R-CNN for action unit detection [J ] . Neurocomputing , 2019 , 355 : 35 - 47 .

SONG T F , CUI Z J , ZHENG W M , et al . Hybrid message passing with performance-driven structures for facial action unit detection [C ] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 6263 - 6272 .

YAN J W , WANG J J , LI Q , et al . Self-supervised regional and temporal auxiliary tasks for facial action unit recognition [C ] // Proceedings of the ACM International Conference on Multimedia (MM) . New York : ACM , 2021 : 1038 - 1046 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据