Abstract
Action recognition is crucial for many computer vision applications. Recently, deep learning has made breakthrough in recognition performance of action. However, there are a large number of redundant video frames which contain similar information making it difficult to capture discriminative spatiooral features for long-term actions. In this paper, we propose a novel framework for action recognition: Deep Key Clips-Video feature fusion framework. First, we propose a key clip selection algorithm based on background subtraction, which utilizes image average gradient and select key clips for training. Then, we further superimpose the key frames to generate historical contour images, effectively aggregating long-term information of the actions. Key video clips and historical contour images are inputted to the 3D convolutional network and the 2D convolutional network respectively, which extract the clip level and long term video level feature. Finally, we fuse these two sub-networks to improve the accuracy of recognition. We conduct experiments on two current mainstream action recognition datasets UCF-101 and HMDB-51. Compared with the state-of-the-art methods, the experimental results demonstrate the effectiveness of our proposed network for action recognition.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 156-161 |
Number of pages | 6 |
ISBN (Electronic) | 9781538692141 |
ISBN (Print) | 9781538692158 |
DOIs | |
Publication status | Published - 15 Aug 2019 |
Event | 2019 IEEE International Conference on Multimedia and Expo Workshops - Shanghai, China Duration: 8 Jul 2019 → 12 Jul 2019 |
Conference
Conference | 2019 IEEE International Conference on Multimedia and Expo Workshops |
---|---|
Abbreviated title | ICMEW 2019 |
Country/Territory | China |
City | Shanghai |
Period | 8/07/19 → 12/07/19 |
Keywords
- Action recogntion
- Convolution networks
- Key clips
- Long term actions
- Video level