Video saliency detection via combining temporal difference and pixel gradient

Xiangwei Lu, Muwei Jian*, Rui Wang, Xiangyu Liu, Peiguang Lin, Hui Yu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Even though temporal information matters for the quality of video saliency detection, many problems still arise/emerge in present network frameworks, such as bad performance in time-space coherence and edge continuity. In order to solve these problems, this paper proposes a full convolutional neural network, which integrates temporal differential and pixel gradient to fine tune the edges of salient targets. Considering the features of neighboring frames are highly relevant because of their proximity in location, a co-attention mechanism is used to put pixel-wise weight on the saliency probability map after features extraction with multi-scale pooling so that attention can be paid on both the edge and central of images. And the changes of pixel gradients of original images are used to recursively improve the continuity of target edges and details of central areas. In addition, residual networks are utilized to integrate information between modules, ensuring stable connections between the backbone network and modules and propagation of pixel gradient changes. In addition, a self-adjustment strategy for loss functions is presented to solve the problem of overfitting in experiments. The method presented in the paper has been tested with three available public datasets and its effectiveness has been proved after comparing with 6 other typically stat-of-the-art methods.

Original languageEnglish
JournalMultimedia Tools and Applications
Early online date2 Oct 2023
Publication statusEarly online - 2 Oct 2023


  • Co-Attention
  • Edge refinement
  • Pixels gradient
  • Temporal difference
  • Video saliency detection

Cite this