EANTrack: an efficient attention network for visual tracking

Fengwei Gu, Jun Lu, Chengtao Cai, Qidan Zhu, Zhaojie Ju

Research output: Contribution to journalArticlepeer-review

100 Downloads (Pure)

Abstract

Although Siamese trackers have become increasingly prevalent in the visual tracking domain, they are easily interfered by semantic distractors in complex environments, which results in the underutilization of feature information. Especially when multiple disturbances work together, the performance of many trackers often suffers severe degradation. To solve the above problem, this paper presents a robust Stereoscopic Transformer network for improving tracking performance. Using a hybrid attention mechanism, our method is composed of a channel feature awareness network (CFAN), a global channel attention network (GCAN), and a multi-level feature enhancement unit (MFEU).Concretely, CFAN focuses on specific channel information, while highlighting the contained target features and weakening the semantic distractor features. As an intermediate hub, GCAN is mainly responsible for establishing the global feature dependencies between the search region and the template, while selecting the concerned channel features to improve the distinguishing ability of the model. In particular, MFEU is used to enhance multi-level feature information to facilitate feature representation learning for our method. Finally, a Transformer-based Siamese tracker (named VTST) is proposed to present an efficient tracking representation, which can gain advantages over a variety of challenging attributes. Experiments show that our method outperforms the state-of-the-art trackers on multiple benchmarks with a real-time running speed of 56.0 fps.
Original languageEnglish
Number of pages18
JournalIEEE Transactions on Automation Science and Engineering
Early online date3 Oct 2023
DOIs
Publication statusEarly online - 3 Oct 2023

Keywords

  • visual tracking
  • complex environments
  • stereoscopic transformer
  • hybrid attention mechanism

Cite this