Temporal object detection is more challenging than static image detection because of the rich context information. Recently, state-of-the-art works mine context information to detect each frame by using LSTM-based modules. However, restricted by the low-exploration of temporal information, significant results in terms of accuracies and speeds are not reported by the existing methods. In this paper, we propose a new one-stage temporal detector for online video object detection. A new structure with an improved spatiotemporal LSTM (STLSTM) is proposed to suppress useless background information. Next, the SSD-based structure is improved to extract rich features and high-level semantic features. We evaluate the proposed model on the ImageNet benchmark and space human-robot interaction database. Extensive comparisons show that the proposed detector achieves state-of-the-art performance.
|Title of host publication
|27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP)
|Institute of Electrical and Electronics Engineers Inc.
|Number of pages
|Published - 7 Jan 2022
|2021 27th International Conference on Mechatronics and Machine Vision in Practice - Shanghai, China
Duration: 26 Nov 2021 → 28 Nov 2021
|2021 27th International Conference on Mechatronics and Machine Vision in Practice
|26/11/21 → 28/11/21