SUM-GAN-GEA: video summarization using GAN with Gaussian distribution and external attention

Qinghao Yu, Hui Yu*, Yongxiong Wang, Tuan D. Pham

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Video summarization aims to generate a sparse subset that is more concise and less redundant than the original video while containing the most informative parts of the video. However, previous works ignore the prior knowledge of the distribution of interestingness of video frames, making it hard for the network to learn the importance of different frames. Furthermore, traditional models alone (such as RNN and LSTM) are not robust enough in capturing global features of the video sequence since the video frames are more in line with non-Euclidean data structure. To this end, we propose a new summarization method based on the graph model concept to learn the feature relationship connections between video frames, which can guide the summary generator to generate a robust global feature representation. Specifically, we propose to use adversarial learning to integrate Gaussian distribution and external attention mechanism (SUM-GAN-GEA). The Gaussian function is a priori mapping function that considers the distribution of the interestingness of actual video frames and the external attention can reduce the inference time of the model. Experimental results on two popular video abstraction datasets (SumMe and TVSum) demonstrate the high superiority and competitiveness of our method in robustness and fast convergence.

Original languageEnglish
Article number3523
Number of pages15
JournalElectronics (Switzerland)
Volume11
Issue number21
DOIs
Publication statusPublished - 29 Oct 2022

Keywords

  • external attention mechanism
  • GAN
  • Gaussian distribution
  • graph model
  • video abstraction
  • video summarization

Fingerprint

Dive into the research topics of 'SUM-GAN-GEA: video summarization using GAN with Gaussian distribution and external attention'. Together they form a unique fingerprint.

Cite this