Modern tracker demands to perform efficiently robust classification and accurate object state estimation. Recently, the feature fusion plays a vital role in term of accuracy and robustness for a visual tracking system. The traditional feature fusion methods are generally performed via direct summation or concatenation operation, which are entirely unaware of importance of assigning appropriate weights to different levels of features for a robust model. To tackle this issue, a novel deep-leaning based tracker with attention fusion is proposed in this paper. We propose a improved network structure based on ResNet, which is more conducive for the fusion of hierarchical features. The proposed tracker adopts a nonlinear method for feature fusion in backbone network, and introduces an iterative multi-scale attention module to learn different weights of the hierarchical features. In the classification network which is learned online, the third and fourth layer features extracted from the backbone network are used to obtain the coarse location. The extracted features are assigned different weights by an attention mechanism and fed into the estimation network to perform a iterative refinement for the accurate bounding box estimation. The experiments show the proposed tracker’s efficiency and effectiveness.
|Title of host publication||2021 26th International Conference on Automation and Computing (ICAC)|
|Number of pages||6|
|Publication status||Early online - 15 Nov 2021|
|Event||ICAC 2021: The 26th International Conference on Automation and Computing - University of Portsmouth, Portsmouth, United Kingdom|
Duration: 2 Sep 2021 → 4 Sep 2021
|Conference||ICAC 2021: The 26th International Conference on Automation and Computing|
|Period||2/09/21 → 4/09/21|