Image-based human pose estimation

Student thesis: Doctoral Thesis


Human pose estimation has become an active research topic in the field of computer vision. However, there are still some technical challenges because of the complexity of human motion. Although the depth sensors, such as Kinect and Xtion, open up new possibilities of handling with issues, they present some new challenges. In this thesis, we only address human pose estimation frameworks based on colour image and explore the possibility of the tradeoff between effective representing features and models.
Firstly, the task of human pose estimation can be treated as a regression model. So we propose a novel method based on the regression model, which is designed for estimating the upper joints and recognizing their special motions. We verified the proposed method on our recorded dataset and the experimental results show the proposed method is effective. This provides an important clue that the performance of human joints estimation contributes significantly for human motion estimation.
Secondly, the computation problems are always making it difficult for computer vision. For example, the pictorial structures normally use the interactions between connected joints such as elbow and shoulder, leading to a quadratic computation cost in the number of pixels for the inference process. Then a simple model for restricting themselves is proposed, which only measure the quality of limb-pair possibilities. Meanwhile, it allows the efficient inference in richer models, which exploit the data-dependent interactions.
Thirdly, to improve the effectiveness of the body pose estimation, we introduce a object tracking method to the body pose estimation process. In addition, we introduce structured prediction aggregate model, which only need to focus on necessary computational effort. It can ensure the accurate output by filtering out many states cheaply. Meanwhile, our proposed decomposition method use cyclic dependencies on a tree model when imposing the model agreement. Thus it allows for efficient inference on a video or an image.
To sum up, we evaluate our proposed methods on public datasets and compare them with some popular methods to demonstrate both the efficiency and effectiveness. The model pairwise interaction potentials are afforded with data-dependent features and the aggregate model. The experimental results show that our model is worthwhile and features used are accurate for pose estimation on popular datasets.
Date of AwardNov 2017
Original languageEnglish
SupervisorZhaojie Ju (Supervisor) & Honghai Liu (Supervisor)

Cite this