Video summarization is to extract effective information from videos to quickly obtain the most informative summary. Most of the existing video summarization methods use recurrent neural networks and their variants such as long and short-term memory (LSTM), to simulates the variable range time dependence between video frames. However, those methods can only process serial inputs of the video frames along with the hidden layer information from the previous time step, which affects the performance and the quality of video summarization. To tackle this issue, we present a deep non-local video summarization network (DN-VSN) for original video abstracts in this paper. Our unsupervised model treats video summarization as a sequence of decision problems. Given an input video, the probability that a video frame is selected as a part of the summary is obtained through a non-local convolutional network, and a strategy gradient algorithm of reinforcement learning is adopted for optimization in the training phase. The proposed method has been tested on four widely used datasets. The experimental results show the superiority of the proposed unsupervised model over the state-of-the-art methods.
- non-local convolutional network
- non-local video summarization network
- reinforcement learning
- video summarization