Deep key clips-video feature fusion framework for action recognition

Chao Li, Yue Ming, Yuan Shen, Hui Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Action recognition is crucial for many computer vision applications. Recently, deep learning has made breakthrough in recognition performance of action. However, there are a large number of redundant video frames which contain similar information making it difficult to capture discriminative spatiooral features for long-term actions. In this paper, we propose a novel framework for action recognition: Deep Key Clips-Video feature fusion framework. First, we propose a key clip selection algorithm based on background subtraction, which utilizes image average gradient and select key clips for training. Then, we further superimpose the key frames to generate historical contour images, effectively aggregating long-term information of the actions. Key video clips and historical contour images are inputted to the 3D convolutional network and the 2D convolutional network respectively, which extract the clip level and long term video level feature. Finally, we fuse these two sub-networks to improve the accuracy of recognition. We conduct experiments on two current mainstream action recognition datasets UCF-101 and HMDB-51. Compared with the state-of-the-art methods, the experimental results demonstrate the effectiveness of our proposed network for action recognition.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2019
PublisherIEEE
Pages156-161
Number of pages6
ISBN (Electronic)9781538692141
ISBN (Print)9781538692158
DOIs
Publication statusPublished - 15 Aug 2019
Event2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2019 - Shanghai, China
Duration: 8 Jul 201912 Jul 2019

Conference

Conference2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2019
Country/TerritoryChina
CityShanghai
Period8/07/1912/07/19

Keywords

  • Action recogntion
  • Convolution networks
  • Key clips
  • Long term actions
  • Video level

Fingerprint

Dive into the research topics of 'Deep key clips-video feature fusion framework for action recognition'. Together they form a unique fingerprint.

Cite this