Multi-Modal Emotion Recognition: Extracting Public and Private Information

multi-modal emotion recognition 

Multimodal emotion recognition is important for facilitating efficient interaction between humans and machines. To better detect emotional states from multimodal data, we need to effectively extract both the common information that captures dependencies among different modalities, and the private information that characterizes variations in each modality. However, existing works are mostly designed to pursue either one of these objectives but not both. In our work, we propose an end-to-end learning approach to simultaneously extract the common and private information for multimodal emotion recognition. Specifically, we use a correlation loss based on Hirschfeld-Gebelein-Renyi Maximal correlation and a reconstruction loss based on autoencoders to preserve the common and private information, respectively. Experimental results on eNTERFACE’05 database and RML database demonstrate the effectiveness of our proposed approach.


Fei Ma, Wei Zhang, Yang Li, Shao-Lun Huang, and Lin Zhang, An End-to-End Learning Approach for Multimodal Emotion Recognition: Extracting Common and Private Information, IEEE International Conference on Multimedia and Expo (ICME), July 2019 (Accepted) pdf