Abstract
Temporal object detection is more challenging than static image detection because of the rich context information. Recently, state-of-the-art works mine context information to detect each frame by using LSTM-based modules. However, restricted by the low-exploration of temporal information, significant results in terms of accuracies and speeds are not reported by the existing methods. In this paper, we propose a new one-stage temporal detector for online video object detection. A new structure with an improved spatiotemporal LSTM (STLSTM) is proposed to suppress useless background information. Next, the SSD-based structure is improved to extract rich features and high-level semantic features. We evaluate the proposed model on the ImageNet benchmark and space human-robot interaction database. Extensive comparisons show that the proposed detector achieves state-of-the-art performance.
| Original language | English |
|---|---|
| Title of host publication | 27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP) |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 464-468 |
| Number of pages | 5 |
| ISBN (Electronic) | 9781665431538 |
| ISBN (Print) | 9781665431545 |
| DOIs | |
| Publication status | Published - 7 Jan 2022 |
| Event | 2021 27th International Conference on Mechatronics and Machine Vision in Practice - Shanghai, China Duration: 26 Nov 2021 → 28 Nov 2021 |
Conference
| Conference | 2021 27th International Conference on Mechatronics and Machine Vision in Practice |
|---|---|
| Abbreviated title | M2VIP |
| Country/Territory | China |
| City | Shanghai |
| Period | 26/11/21 → 28/11/21 |