Abstract
Gaze tracking is widely used in driver safety driving, visual impairment detection, virtual reality, human robot interaction, and reading process tracking. However, varying illumination, various head poses, different distances between human and cameras, occlusion of hair or glasses, and low-quality images pose huge challenges to accurate gaze tracking. In this article, based on binocular feature fusion and convolution neural network, a novel method of gaze tracking is proposed, in which local binocular spatial attention mechanism (LBSAM) and global binocular spatial attention mechanism (GBSAM) are integrated into the network model to improve the accuracy. Furthermore, the proposed method is validated on the GazeCapture database. In addition, four groups of comparative experiments have been conducted: between binocular feature fusion model and binocular data fusion model; among the local binocular spatial attention model, the local binocular channel attention model, and the model without local binocular attention mechanism; between the model with GBSAM and that without GBSAM; and between the proposed method and other state-of-the-art approaches. The experimental results verify the advantages of binocular feature fusion, LBSAM and GBSAM, and the effectiveness of the proposed method.
Original language | English |
---|---|
Article number | 2 |
Pages (from-to) | 302-311 |
Number of pages | 10 |
Journal | IEEE Transactions on Human-Machine Systems |
Volume | 52 |
Issue number | 2 |
Early online date | 7 Feb 2022 |
DOIs | |
Publication status | Published - 1 Apr 2022 |
Keywords
- attention mechanism
- convolution
- convolution neural network (CNN)
- databases
- faces
- feature extraction
- feature fusion
- gaze tracking
- predictive models
- solid modeling