A discriminative deep model with feature fusion and temporal attention for human action recognition
Research output: Contribution to journal › Article › peer-review
Activity recognition which aims to accurately distinguish human actions in complex environments plays a key role in human-robot/computer interaction. However, long-lasting and similar actions will cause poor feature sequence extraction and thus lead to a reduction of the recognition accuracy. We propose a novel discriminative deep model (D3D-LSTM) based on 3D-CNN and LSTM for both singletarget and interaction action recognition to improve the spatiotemporal processing performance. Our models have several notable properties: 1) A real-time feature fusion method is used to obtain a more representative feature sequence through composition of local mixtures for enhancing the performance of discriminating similar actions; 2) We introduce an improved attention mechanism that focuses on each frame individually by assigning different weights in real-time; 3) An alternating optimization strategy is proposed for our model to obtain parameters with the best performance. Because the proposed D3D-LSTM model is efficient enough to be used as a detector that recognizes various activities, a Real-set database is collected to evaluate action recognition in complex real-world scenarios. For long-term relations, we update the present memory state via the weight-controlled attention module that enables the memory cell to store better long-term features. The densely connected bimodal modal makes local perceptrons of 3D-Conv motion-aware and stores better short-term features. The proposed D3D-LSTM model has been evaluated through a series of experiments on the Real-set and open-source datasets, i.e. SBU-Kinect and MSR-action-3D. Experimental results show that the proposed D3D-LSTM model achieves new state-of-the-art results, including pushing the average rate of the SBU-Kinect to 92.40% and the average rate of the MSR-action-3D to 95.40%.
|Number of pages||13|
|Publication status||Published - 2 Mar 2020|
Final published version, 1.42 MB, PDF document
Licence: CC BY