A one-stage temporal detector with attentional LSTM for video object detection

Jiahui Yu, Zhaojie Ju, Hongwei Gao, Dalin Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

134 Downloads (Pure)


Temporal object detection is more challenging than static image detection because of the rich context information. Recently, state-of-the-art works mine context information to detect each frame by using LSTM-based modules. However, restricted by the low-exploration of temporal information, significant results in terms of accuracies and speeds are not reported by the existing methods. In this paper, we propose a new one-stage temporal detector for online video object detection. A new structure with an improved spatiotemporal LSTM (STLSTM) is proposed to suppress useless background information. Next, the SSD-based structure is improved to extract rich features and high-level semantic features. We evaluate the proposed model on the ImageNet benchmark and space human-robot interaction database. Extensive comparisons show that the proposed detector achieves state-of-the-art performance.
Original languageEnglish
Title of host publication27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781665431538
ISBN (Print)9781665431545
Publication statusPublished - 7 Jan 2022
Event2021 27th International Conference on Mechatronics and Machine Vision in Practice - Shanghai, China
Duration: 26 Nov 202128 Nov 2021


Conference2021 27th International Conference on Mechatronics and Machine Vision in Practice
Abbreviated titleM2VIP


Dive into the research topics of 'A one-stage temporal detector with attentional LSTM for video object detection'. Together they form a unique fingerprint.

Cite this