TY - JOUR
T1 - Video summarization using knowledge distillation-based attentive network
AU - Qin, Jialin
AU - Yu, Hui
AU - Liang, Wei
AU - Ding, Derui
N1 - Publisher Copyright:
© 2024, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2024/1/11
Y1 - 2024/1/11
N2 - The vast volumes of videos produced daily require highly efficient measures to ensure that key information is reported for effective review and storage, which leads to the popularity of video summarization techniques. Deep learning has shown its advantages in video summarization, especially convolutional neural network, which are effective in extracting features for video summarization. However, the deep network layers and the limited range of temporal dependence make it challenging to deploy the network and thus affect the accuracy of identifying important video frames. To tackle these issues, we present a knowledge distillation-based attentive network (KDAN) for supervised video summarization in this paper. The proposed method separates the full convolutional network from the attention mechanism based on the idea of education and learning processes in biology and uses a full convolutional network as a teacher network to guide the learning of the student network consisting of an attention mechanism. The obtained lightweight network considers the knowledge learned from both networks, thus solving the problems of explosion in the number of participants and slow training. We have conducted experiments on two widely used benchmarks SumMe and TVSum. DANtea achieves F-scores 53.09 and 60.30, and DAN achieves F-scores 51.26 and 61.55 in Canonical settings on the SumMe and TVSum datasets, respectively. Experiments on two public benchmarks SumMe and TVSum demonstrate the effectiveness and superiority of the proposed network over existing state-of-the-art methods.
AB - The vast volumes of videos produced daily require highly efficient measures to ensure that key information is reported for effective review and storage, which leads to the popularity of video summarization techniques. Deep learning has shown its advantages in video summarization, especially convolutional neural network, which are effective in extracting features for video summarization. However, the deep network layers and the limited range of temporal dependence make it challenging to deploy the network and thus affect the accuracy of identifying important video frames. To tackle these issues, we present a knowledge distillation-based attentive network (KDAN) for supervised video summarization in this paper. The proposed method separates the full convolutional network from the attention mechanism based on the idea of education and learning processes in biology and uses a full convolutional network as a teacher network to guide the learning of the student network consisting of an attention mechanism. The obtained lightweight network considers the knowledge learned from both networks, thus solving the problems of explosion in the number of participants and slow training. We have conducted experiments on two widely used benchmarks SumMe and TVSum. DANtea achieves F-scores 53.09 and 60.30, and DAN achieves F-scores 51.26 and 61.55 in Canonical settings on the SumMe and TVSum datasets, respectively. Experiments on two public benchmarks SumMe and TVSum demonstrate the effectiveness and superiority of the proposed network over existing state-of-the-art methods.
KW - attentive network
KW - dilated convolution
KW - dual attention
KW - knowledge distillation
KW - video summarization
UR - http://www.scopus.com/inward/record.url?scp=85182498233&partnerID=8YFLogxK
U2 - 10.1007/s12559-023-10243-3
DO - 10.1007/s12559-023-10243-3
M3 - Article
AN - SCOPUS:85182498233
SN - 1866-9956
JO - Cognitive Computation
JF - Cognitive Computation
ER -