Accurate visual tracking with attention feature fusion

Shuo Hu, Linna Sun, Hui Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

36 Downloads (Pure)


Modern tracker demands to perform efficiently robust classification and accurate object state estimation. Recently, the feature fusion plays a vital role in term of accuracy and robustness for a visual tracking system. The traditional feature fusion methods are generally performed via direct summation or concatenation operation, which are entirely unaware of importance of assigning appropriate weights to different levels of features for a robust model. To tackle this issue, a novel deep-leaning based tracker with attention fusion is proposed in this paper. We propose a improved network structure based on ResNet, which is more conducive for the fusion of hierarchical features. The proposed tracker adopts a nonlinear method for feature fusion in backbone network, and introduces an iterative multi-scale attention module to learn different weights of the hierarchical features. In the classification network which is learned online, the third and fourth layer features extracted from the backbone network are used to obtain the coarse location. The extracted features are assigned different weights by an attention mechanism and fed into the estimation network to perform a iterative refinement for the accurate bounding box estimation. The experiments show the proposed tracker’s efficiency and effectiveness.
Original languageEnglish
Title of host publication2021 26th International Conference on Automation and Computing (ICAC)
EditorsChenguang Yang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781860435577
ISBN (Print)9781665443524
Publication statusEarly online - 15 Nov 2021
EventICAC 2021: The 26th International Conference on Automation and Computing - University of Portsmouth, Portsmouth, United Kingdom
Duration: 2 Sept 20214 Sept 2021


ConferenceICAC 2021: The 26th International Conference on Automation and Computing
Country/TerritoryUnited Kingdom


Dive into the research topics of 'Accurate visual tracking with attention feature fusion'. Together they form a unique fingerprint.

Cite this