Abstract
Temporal object detection is more challenging than static image detection because of the rich context information. Recently, state-of-the-art works mine context information to detect each frame by using LSTM-based modules. However, restricted by the low-exploration of temporal information, significant results in terms of accuracies and speeds are not reported by the existing methods. In this paper, we propose a new one-stage temporal detector for online video object detection. A new structure with an improved spatiotemporal LSTM (STLSTM) is proposed to suppress useless background information. Next, the SSD-based structure is improved to extract rich features and high-level semantic features. We evaluate the proposed model on the ImageNet benchmark and space human-robot interaction database. Extensive comparisons show that the proposed detector achieves state-of-the-art performance.
Original language | English |
---|---|
Title of host publication | 27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP) |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 464-468 |
Number of pages | 5 |
ISBN (Electronic) | 9781665431538 |
ISBN (Print) | 9781665431545 |
DOIs | |
Publication status | Published - 7 Jan 2022 |
Event | 2021 27th International Conference on Mechatronics and Machine Vision in Practice - Shanghai, China Duration: 26 Nov 2021 → 28 Nov 2021 |
Conference
Conference | 2021 27th International Conference on Mechatronics and Machine Vision in Practice |
---|---|
Abbreviated title | M2VIP |
Country/Territory | China |
City | Shanghai |
Period | 26/11/21 → 28/11/21 |