TY - JOUR
T1 - HR-GCN
T2 - 2D-3D whole-body pose estimation with high-resolution graph convolutional network from a monocular camera
AU - Zhang, Mingyu
AU - Gao, Qing
AU - Lai, Yuanchuan
AU - Hu, Junjie
AU - Zhang, Xin
AU - Ju, Zhaojie
N1 - Publisher Copyright:
© 2001-2012 IEEE.
PY - 2025/4/10
Y1 - 2025/4/10
N2 - 3D human pose estimation plays a vital role in applications such as action recognition, human-robot interaction, and immersive technologies. While traditional methods focus on coarse body keypoints, 3D whole-body pose estimation localizes keypoints for the entire body, including hands, face, and feet, allowing for the capture of more detailed human motion and expression information, which enhances its applicability to downstream tasks. Although 3D whole-body pose estimation can be achieved using marker-based systems, wearable devices, or multi-view camera setups, employing a monocular camera is the most convenient and cost-effective approach. However, the problem of monocular 3D whole-body pose estimation remains inadequately addressed, with significant shortcomings in accuracy. This paper introduces a High-Resolution Graph Convolutional Network (HR-GCN) designed to address the challenges of 2D-3D whole-body pose estimation. The proposed HR-GCN leverages the structural properties of graph convolutional networks to model the human skeleton, enabling accurate 3D pose estimation from 2D keypoints. The framework consists of two key modules: the High-Resolution Module (HRM) for extracting 3D body keypoints and coarse-grained features, and the Fine-Grained Keypoints Prediction Module (FGKPM) for refining the 3D coordinates of hands and face. Extensive experiments demonstrate the effectiveness of HR-GCN on the H3WB dataset, showcasing a significant reduction in Mean Per Joint Position Error (MPJPE) compared to existing state-of-the-art (SOTA) methods. The code and model are available at https://github.com/Z-mingyu/HR-GCN.git.
AB - 3D human pose estimation plays a vital role in applications such as action recognition, human-robot interaction, and immersive technologies. While traditional methods focus on coarse body keypoints, 3D whole-body pose estimation localizes keypoints for the entire body, including hands, face, and feet, allowing for the capture of more detailed human motion and expression information, which enhances its applicability to downstream tasks. Although 3D whole-body pose estimation can be achieved using marker-based systems, wearable devices, or multi-view camera setups, employing a monocular camera is the most convenient and cost-effective approach. However, the problem of monocular 3D whole-body pose estimation remains inadequately addressed, with significant shortcomings in accuracy. This paper introduces a High-Resolution Graph Convolutional Network (HR-GCN) designed to address the challenges of 2D-3D whole-body pose estimation. The proposed HR-GCN leverages the structural properties of graph convolutional networks to model the human skeleton, enabling accurate 3D pose estimation from 2D keypoints. The framework consists of two key modules: the High-Resolution Module (HRM) for extracting 3D body keypoints and coarse-grained features, and the Fine-Grained Keypoints Prediction Module (FGKPM) for refining the 3D coordinates of hands and face. Extensive experiments demonstrate the effectiveness of HR-GCN on the H3WB dataset, showcasing a significant reduction in Mean Per Joint Position Error (MPJPE) compared to existing state-of-the-art (SOTA) methods. The code and model are available at https://github.com/Z-mingyu/HR-GCN.git.
KW - 3D whole-body pose estimation
KW - Graph convolutional network
KW - Human pose estimation
UR - http://www.scopus.com/inward/record.url?scp=105002811169&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2025.3557770
DO - 10.1109/JSEN.2025.3557770
M3 - Article
AN - SCOPUS:105002811169
SN - 1530-437X
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
ER -