View-robust neural networks for unseen human action recognition in videos

Jiahui Yu, Tianyu Ma, Zhaojie Ju, Hang Chen, Yingke Xu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data-driven deep learning achieved excellent performance for human action recognition. However, unseen action recognition remains a challenge for most existing neural networks. Because the action categories, collection perspectives, and scenarios considered during data collection are limited. Compared with class-unseen action recognition, view-unseen action recognition in videos is under-explored. This paper proposes view-robust neural networks (VR-Net) to recognize unseen actions in videos. The VR-Net consists of a 3D pose estimation module, skeleton adaptive transformation neural networks, and classification modules. We first extract 3D skeleton models from the video sequence based on existing pose estimation methods. Next, we propose a skeleton representation transformation scheme and achieve it based on Convolutional Neural Networks (VR-CNN) and Graph Neural Networks (VR-GCN), resulting in the optimal skeleton representations. Futhermore, we explore an associate optimization scheme and a fused output method. We evaluate the proposed neural networks on three challenging benchmarks, i.e., NTU RGB-D dataset (NTU), Kinetics-400 dataset, and Human3.6M dataset (H3.6M). The experimental results show that view robust neural networks achieve the top performance compared to state-of-the-art RGB-based and skeleton-based works, such as 93.6% on the NTU (CV) and 94.6% on the Kinetics-400 dataset (Top-5). The proposed neural networks significantly improve the recognition performance for unseen action recognition, such as 86.8% on the H3.6M (View 2).

Original languageEnglish
Title of host publication2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1242-1247
Number of pages6
ISBN (Electronic)9781665452588, 9781665452571
ISBN (Print)9781665452595
DOIs
Publication statusPublished - 18 Nov 2022
Event2022 IEEE International Conference on Systems, Man, and Cybernetics - Prague, Czech Republic
Duration: 9 Oct 202212 Oct 2022

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume2022-October
ISSN (Print)1062-922X

Conference

Conference2022 IEEE International Conference on Systems, Man, and Cybernetics
Abbreviated titleSMC 2022
Country/TerritoryCzech Republic
CityPrague
Period9/10/2212/10/22

Keywords

  • CNN
  • deep learning
  • human action recognition

Fingerprint

Dive into the research topics of 'View-robust neural networks for unseen human action recognition in videos'. Together they form a unique fingerprint.

Cite this