电子学报 ›› 2019, Vol. 47 ›› Issue (8): 1643-1653.DOI: 10.3969/j.issn.0372-2112.2019.08.006

• 学术论文 • 上一篇    下一篇

基于多视觉描述子及音频特征的动态序列人脸表情识别

李宏菲1,2, 李庆1,2, 周莉1,2   

  1. 1. 中国科学院微电子研究所, 北京 100029;
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2018-10-22 修回日期:2018-12-21 出版日期:2019-08-25
    • 通讯作者:
    • 李庆
    • 作者简介:
    • 李宏菲 女,1990年生于河北省三河市.中国科学院微电子研究所博士研究生.研究方向:基于视觉的汽车主动安全与智能驾驶.
    • 基金资助:
    • 国家自然科学基金 (No.U1832217); 面向智能驾驶的新能源汽车电子开放平台建设与产业化 (No.KFJ-STS-ZDTP-045)

Dynamic Facial Expression Recognition Based on Multi-Visual and Audio Descriptors

LI Hong-fei1,2, LI Qing1,2, ZHOU Li1,2   

  1. 1. Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2018-10-22 Revised:2018-12-21 Online:2019-08-25 Published:2019-08-25
    • Supported by:
    • National Natural Science Foundation of China (No.U1832217); Construction and Industrialization of Electronic Open Platform of new energy vehicle with intelligent driving system (No.KFJ-STS-ZDTP-045)

摘要: 关于面部表情识别的应用也正在渗透至各个领域,如安全驾驶、商品销售、临床医学等等.本文对面部表情识别相关技术进行研究,主要工作及贡献如下:研究非约束条件下人脸动态表情识别,提出了一种基于多视觉描述子及音频特征融合策略的动态表情识别算法.借助多视觉描述子的空时局部特征描述实现动态表情特征的提取;而视频、音频特征的融合策略改善了表情识别性能.基于协方差矩阵及时间轴分段的动态规整,有效地解决了具有不同时长的动态表情序列的样本描述.为进一步改善表情识别模型的泛化性能,本文引入了基于多个体识别模型加权投票的集成识别模型.针对投票过程中的权值学习,提出了基于随机重采样的投票权重学习以及基于个体分类模型相对优势的投票权重学习方法.集成决策进一步改善了表情识别性能.基于AFEW5.0的动态表情库实验验证了算法的有效性.

关键词: 动态表情识别, 多视觉描述子, 集成分类器, 权重学习

Abstract: Communication in any form either verbal or non-verbal is vital to complete various daily routine tasks and plays a significant role in life. Facial expression is the most effective form of non-verbal communication and it provides a clue about emotional state, mindset and intention. Till now, facial expression recognition has been successfully applied to various fields such as safe driving, merchandise sales, clinical medicine, and so on. This thesis explores key techniques related to facial expression recognition. The main work and contributions are as follows. A dynamic facial expression recognition algorithm based on multi-visual descriptors and audio features is proposed under unrestricted conditions,in which dynamic facial feature extraction was conducted based on local spatial-temporal feature representation via multi-visual descriptors.Furthermore,the combination of video and audio features improves the recognition performance. Dynamic time warping based on timeline segmentation and covariance matrix proves to be effective in analyzing dynamic expression sequences of different time duration. To improve the generalization performance of facial expression recognition model, an integrated decision-making strategy based on weight voting by multiple individual recognition models is introduced. In order to effectively learning the weight for each individual recognition model, the method of voting weight learning by random re-sampling and the method of voting learning based on comparative advantages of individual recognition model are proposed. Finally the above ensemble model is applied and the recognition performance is further improved. Experiments on AFEW5.0 dataset validate the performance of the proposed dynamic facial expression algorithm.

Key words: active expression recognition, multi-visual descriptors, ensemble model, weight learning