Binocular feature fusion and spatial attention mechanism based gaze tracking

Lihong Dai, Jinguo Liu, Zhaojie Ju

Research output: Contribution to journalArticlepeer-review

120 Downloads (Pure)


Gaze tracking is widely used in driver safety driving, visual impairment detection, virtual reality, human robot interaction, and reading process tracking. However, varying illumination, various head poses, different distances between human and cameras, occlusion of hair or glasses, and low-quality images pose huge challenges to accurate gaze tracking. In this article, based on binocular feature fusion and convolution neural network, a novel method of gaze tracking is proposed, in which local binocular spatial attention mechanism (LBSAM) and global binocular spatial attention mechanism (GBSAM) are integrated into the network model to improve the accuracy. Furthermore, the proposed method is validated on the GazeCapture database. In addition, four groups of comparative experiments have been conducted: between binocular feature fusion model and binocular data fusion model; among the local binocular spatial attention model, the local binocular channel attention model, and the model without local binocular attention mechanism; between the model with GBSAM and that without GBSAM; and between the proposed method and other state-of-the-art approaches. The experimental results verify the advantages of binocular feature fusion, LBSAM and GBSAM, and the effectiveness of the proposed method.

Original languageEnglish
Article number2
Pages (from-to)302-311
Number of pages10
Journal IEEE Transactions on Human-Machine Systems
Issue number2
Early online date7 Feb 2022
Publication statusPublished - 1 Apr 2022


  • attention mechanism
  • convolution
  • convolution neural network (CNN)
  • databases
  • faces
  • feature extraction
  • feature fusion
  • gaze tracking
  • predictive models
  • solid modeling


Dive into the research topics of 'Binocular feature fusion and spatial attention mechanism based gaze tracking'. Together they form a unique fingerprint.

Cite this