TY - JOUR
T1 - SUM-GAN-GEA: video summarization using GAN with Gaussian distribution and external attention
AU - Yu, Qinghao
AU - Yu, Hui
AU - Wang, Yongxiong
AU - Pham, Tuan D.
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/10/29
Y1 - 2022/10/29
N2 - Video summarization aims to generate a sparse subset that is more concise and less redundant than the original video while containing the most informative parts of the video. However, previous works ignore the prior knowledge of the distribution of interestingness of video frames, making it hard for the network to learn the importance of different frames. Furthermore, traditional models alone (such as RNN and LSTM) are not robust enough in capturing global features of the video sequence since the video frames are more in line with non-Euclidean data structure. To this end, we propose a new summarization method based on the graph model concept to learn the feature relationship connections between video frames, which can guide the summary generator to generate a robust global feature representation. Specifically, we propose to use adversarial learning to integrate Gaussian distribution and external attention mechanism (SUM-GAN-GEA). The Gaussian function is a priori mapping function that considers the distribution of the interestingness of actual video frames and the external attention can reduce the inference time of the model. Experimental results on two popular video abstraction datasets (SumMe and TVSum) demonstrate the high superiority and competitiveness of our method in robustness and fast convergence.
AB - Video summarization aims to generate a sparse subset that is more concise and less redundant than the original video while containing the most informative parts of the video. However, previous works ignore the prior knowledge of the distribution of interestingness of video frames, making it hard for the network to learn the importance of different frames. Furthermore, traditional models alone (such as RNN and LSTM) are not robust enough in capturing global features of the video sequence since the video frames are more in line with non-Euclidean data structure. To this end, we propose a new summarization method based on the graph model concept to learn the feature relationship connections between video frames, which can guide the summary generator to generate a robust global feature representation. Specifically, we propose to use adversarial learning to integrate Gaussian distribution and external attention mechanism (SUM-GAN-GEA). The Gaussian function is a priori mapping function that considers the distribution of the interestingness of actual video frames and the external attention can reduce the inference time of the model. Experimental results on two popular video abstraction datasets (SumMe and TVSum) demonstrate the high superiority and competitiveness of our method in robustness and fast convergence.
KW - external attention mechanism
KW - GAN
KW - Gaussian distribution
KW - graph model
KW - video abstraction
KW - video summarization
UR - http://www.scopus.com/inward/record.url?scp=85141555704&partnerID=8YFLogxK
U2 - 10.3390/electronics11213523
DO - 10.3390/electronics11213523
M3 - Article
AN - SCOPUS:85141555704
SN - 2079-9292
VL - 11
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 21
M1 - 3523
ER -