Abstract
Action recognition is a hot topic in the field of computer vision. It has been widely used in human-computer/robot interaction, abnormal behavior monitoring, and medical assistive. Because of the excellent robustness of skeleton data, it has attracted many scholars to research skeleton-based action recognition. Most of the current skeleton-based action recognition methods suffer from the incomplete and poor generalization of the input features, inadequate feature extraction by the network model, and imbalance between recognition accuracy and model size. We analyze the critical skeleton features for action recognition to solve these problems and propose a multi-features and multi-stream network (MM-Net) for real-time action recognition. First, three pairs of features are proposed, which are joint distance (JD) and joint distance velocity (JDV), joint angle (JA) and joint angle velocity (JAV), and fast-action joint position (FJP) and slow-action joint position (SJP). Second, a multi-features and multi-stream network is proposed by using one-dimensional convolutional neural network (1DCNN) to reduce the number of parameters of the model and fully extract the three pairs of features. As a result, MM-Net achieves the highest accuracies on both JHMDB (86.5%) and SHREC (96.4% on coarse and 93.3% on fine datasets). In addition, MM-Net is applied to a human-robot interaction (HRI) platform, which proves the practicality of MM-Net.
Original language | English |
---|---|
Pages (from-to) | 7397-7409 |
Number of pages | 13 |
Journal | IEEE Sensors Journal |
Volume | 23 |
Issue number | 7 |
Early online date | 23 Feb 2023 |
DOIs | |
Publication status | Published - 1 Apr 2023 |
Keywords
- cameras
- face recognition
- feature extraction
- human-computer interaction
- human-robot interaction
- multi-feature
- real-time
- real-time systems
- sensors
- skeleton
- skeleton-based action recognition