Deep key clips-video feature fusion framework for action recognition

Chao Li, Yue Ming, Yuan Shen, Hui Yu

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Action recognition is crucial for many computer vision applications. Recently, deep learning has made breakthrough in recognition performance of action. However, there are a large number of redundant video frames which contain similar information making it difficult to capture discriminative spatiooral features for long-term actions. In this paper, we propose a novel framework for action recognition: Deep Key Clips-Video feature fusion framework. First, we propose a key clip selection algorithm based on background subtraction, which utilizes image average gradient and select key clips for training. Then, we further superimpose the key frames to generate historical contour images, effectively aggregating long-term information of the actions. Key video clips and historical contour images are inputted to the 3D convolutional network and the 2D convolutional network respectively, which extract the clip level and long term video level feature. Finally, we fuse these two sub-networks to improve the accuracy of recognition. We conduct experiments on two current mainstream action recognition datasets UCF-101 and HMDB-51. Compared with the state-of-the-art methods, the experimental results demonstrate the effectiveness of our proposed network for action recognition.

    Original languageEnglish
    Title of host publicationProceedings - 2019 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2019
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages156-161
    Number of pages6
    ISBN (Electronic)9781538692141
    ISBN (Print)9781538692158
    DOIs
    Publication statusPublished - 15 Aug 2019
    Event2019 IEEE International Conference on Multimedia and Expo Workshops - Shanghai, China
    Duration: 8 Jul 201912 Jul 2019

    Conference

    Conference2019 IEEE International Conference on Multimedia and Expo Workshops
    Abbreviated titleICMEW 2019
    Country/TerritoryChina
    CityShanghai
    Period8/07/1912/07/19

    Keywords

    • Action recogntion
    • Convolution networks
    • Key clips
    • Long term actions
    • Video level

    Fingerprint

    Dive into the research topics of 'Deep key clips-video feature fusion framework for action recognition'. Together they form a unique fingerprint.

    Cite this