CNN-based facial expression recognition from annotated RGB-D images for human-robot interaction

Jing Li, Yang Mi, Gongfa Li, Zhaojie Ju*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

435 Downloads (Pure)


Facial expression recognition has been widely used in human computer interaction (HCI) systems. Over the years, researchers have proposed different feature descriptors, implemented different classification methods, and carried out a number of experiments on various datasets for automatic facial expression recognition. However, most of them used 2D static images or 2D video sequences for the recognition task. The main limitations of 2D-based analysis are problems associated with variations in pose and illumination, which reduce the recognition accuracy. Therefore, an alternative way is to incorporate depth information acquired by 3D sensor, because it is invariant in both pose and illumination. In this paper, we present a two-stream convolutional neural network (CNN)-based facial expression recognition system and test it on our own RGB-D facial expression dataset collected by Microsoft Kinect for XBOX in unspontaneous scenarios since Kinect is an inexpensive and portable device to capture both RGB and depth information. Our fully annotated dataset includes seven expressions (i.e., neutral, sadness, disgust, fear, happiness, anger, and surprise) for 15 subjects (9 males and 6 females) aged from 20 to 25. The two individual CNNs are identical in architecture but do not share parameters. To combine the detection results produced by these two CNNs, we propose the late fusion approach. The experimental results demonstrate that the proposed two-stream network using RGB-D images is superior to that of using only RGB images or depth images.

Original languageEnglish
JournalInternational Journal of Humanoid Robotics
Early online date17 Jul 2019
Publication statusEarly online - 17 Jul 2019


  • convolutional neural network
  • depth information
  • Facial expression recognition
  • Kinect
  • RGB-D images


Dive into the research topics of 'CNN-based facial expression recognition from annotated RGB-D images for human-robot interaction'. Together they form a unique fingerprint.

Cite this