Deep Object Detector with Attentional Spatiotemporal LSTM for Space Human‒Robot Interaction

Jiahui Yu, Hongwei Gao, Yongquan Chen, Dalin Zhou, Jinguo Liu, Zhaojie Ju

Research output: Contribution to journalArticlepeer-review

107 Downloads (Pure)


Global temporal information and local semantic information are essential cues for high-performance online object detection in videos. However, despite their promising detection accuracy in most cases, most state-of-the-art approaches have following two limitations: invalid background/scale suppression and inadequate temporal information mining between frames. Many jobs currently focus on temporal information learning based on a single frame. In this article, we propose an attentional global–local information learning network; this is one of the first attempts to fully use both types of information between frames. Attention maps are creatively utilized to transfer temporal contexts between frames. This also effectively alleviates the adverse effects of scale changes. Furthermore, empowered by a detailed framework, a proposed detector effectively uses multilevel feature extraction. Given these contributions, the proposed detector achieves state-of-the-art performance on challenging benchmarks. Finally, practical experiments are conducted on a space human–robot interaction platform.
Original languageEnglish
Journal IEEE Transactions on Human-Machine Systems
Early online date10 Feb 2022
Publication statusEarly online - 10 Feb 2022


  • Video object detection
  • SSD
  • Attention model
  • Space human-robot interaction


Dive into the research topics of 'Deep Object Detector with Attentional Spatiotemporal LSTM for Space Human‒Robot Interaction'. Together they form a unique fingerprint.

Cite this