TY - JOUR
T1 - Attention mechanism and Bidirectional Long Short-Term Memory-Based real-time gaze tracking
AU - Dai, Lihong
AU - Liu, Jinguo
AU - Ju, Zhaojie
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/11/21
Y1 - 2024/11/21
N2 - In order to improve the accuracy of gaze tracking in real-time, various attention mechanisms and long short-term memory (LSTM) networks for dynamic continuous video frames are studied in-depth in the paper. A real-time gaze-tracking method (SpatiotemporalAM) based on attention mechanism and bidirectional LSTM (Bi-LSTM) is proposed. Firstly, convolutional neural networks (CNNs) are employed to extract the spatial features of each image. Then, Bi-LSTM is adopted to obtain the dynamic temporal features between continuous frames to leverage the past and future context information. After that, the extracted spatiotemporal features are fused by the output attention mechanism (OAM), which improves the accuracy of gaze tracking. The models with OAM are compared with those with self-attention mechanism (SAM), which confirms the advantages of the former in accuracy and real-time performance. At the same time, a series of measures are taken to improve the accuracy, such as using cosine similarity in the loss function and ResNet50 with bottleneck residual blocks as the baseline network. A large number of experiments are performed on the Gaze360 and GazeCapture of public gaze tracking databases to verify the effectiveness, real-time performance, and generalization ability of the proposed gaze tracking approach.
AB - In order to improve the accuracy of gaze tracking in real-time, various attention mechanisms and long short-term memory (LSTM) networks for dynamic continuous video frames are studied in-depth in the paper. A real-time gaze-tracking method (SpatiotemporalAM) based on attention mechanism and bidirectional LSTM (Bi-LSTM) is proposed. Firstly, convolutional neural networks (CNNs) are employed to extract the spatial features of each image. Then, Bi-LSTM is adopted to obtain the dynamic temporal features between continuous frames to leverage the past and future context information. After that, the extracted spatiotemporal features are fused by the output attention mechanism (OAM), which improves the accuracy of gaze tracking. The models with OAM are compared with those with self-attention mechanism (SAM), which confirms the advantages of the former in accuracy and real-time performance. At the same time, a series of measures are taken to improve the accuracy, such as using cosine similarity in the loss function and ResNet50 with bottleneck residual blocks as the baseline network. A large number of experiments are performed on the Gaze360 and GazeCapture of public gaze tracking databases to verify the effectiveness, real-time performance, and generalization ability of the proposed gaze tracking approach.
KW - Attention mechanism
KW - bidirectional LSTM
KW - CNN
KW - gaze tracking
UR - http://www.scopus.com/inward/record.url?scp=85211925717&partnerID=8YFLogxK
U2 - 10.3390/electronics13234599
DO - 10.3390/electronics13234599
M3 - Article
AN - SCOPUS:85211925717
SN - 2079-9292
VL - 13
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 23
M1 - 4599
ER -