Binocular feature fusion and spatial attention mechanism based gaze tracking

Lihong Dai, Jinguo Liu, Zhaojie Ju

Research output: Contribution to journalArticlepeer-review

132 Downloads (Pure)

Abstract

Gaze tracking is widely used in driver safety driving, visual impairment detection, virtual reality, human robot interaction, and reading process tracking. However, varying illumination, various head poses, different distances between human and cameras, occlusion of hair or glasses, and low-quality images pose huge challenges to accurate gaze tracking. In this article, based on binocular feature fusion and convolution neural network, a novel method of gaze tracking is proposed, in which local binocular spatial attention mechanism (LBSAM) and global binocular spatial attention mechanism (GBSAM) are integrated into the network model to improve the accuracy. Furthermore, the proposed method is validated on the GazeCapture database. In addition, four groups of comparative experiments have been conducted: between binocular feature fusion model and binocular data fusion model; among the local binocular spatial attention model, the local binocular channel attention model, and the model without local binocular attention mechanism; between the model with GBSAM and that without GBSAM; and between the proposed method and other state-of-the-art approaches. The experimental results verify the advantages of binocular feature fusion, LBSAM and GBSAM, and the effectiveness of the proposed method.

Original languageEnglish
Article number2
Pages (from-to)302-311
Number of pages10
Journal IEEE Transactions on Human-Machine Systems
Volume52
Issue number2
Early online date7 Feb 2022
DOIs
Publication statusPublished - 1 Apr 2022

Keywords

  • attention mechanism
  • convolution
  • convolution neural network (CNN)
  • databases
  • faces
  • feature extraction
  • feature fusion
  • gaze tracking
  • predictive models
  • solid modeling

Fingerprint

Dive into the research topics of 'Binocular feature fusion and spatial attention mechanism based gaze tracking'. Together they form a unique fingerprint.

Cite this