Video object detection considering dynamic neighborhood feature multiplexing

Jiahui Yu, Yifan Chen, Xuna Wang, Long Chen, Hang Chen, Dalin Zhou, Yingke Xu, Zhaojie Ju*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

32 Downloads (Pure)

Abstract

Video object detection is essential for human-interaction applications, including bimanual manipulation sensing (BMS). The effects of video detection in practical applications still need to be improved, as they are restricted by long-range spatiotemporal dependency analysis. How do humans sense bimanual manipulation in videos, especially for deteriorated clips? We argue that humans analyze the current clips based on earlier memory, namely, long-term spatial and temporal dependencies (LTSTD). However, most existing methods have yet to report significant results, as the limited exploration of these dependencies limits them. Developing an easy-to-integrate module is generally preferred for future applications rather than designing a complex end-to-end framework. Therefore, we propose a dynamic neighborhood feature multiplexing mechanism for online video object detection in this article, which is better at learning LTSTD in flexible and robust ways, boosting existing detection results, called DNFM. Specifically, we develop dynamic memory enhancement neural networks for better long-term feature aggregation with negligible additional computation costs. We multiplex each frame feature to aggregate key enhanced representations under the guidance of dynamic memory recall. The DNFM contributes to various famous detectors in BMS and other challenging detection tasks, and particular attention has been devoted to “low-quality” frame detection. Experimental results show that, while achieving state-of-the-art detection performance, DNFM clearly illustrates the easy-to-integrate operation for boosting the video object detection results.

Original languageEnglish
JournalIEEE Transactions on Systems, Man, and Cybernetics: Systems
Early online date5 Jun 2025
DOIs
Publication statusEarly online - 5 Jun 2025

Keywords

  • Attention
  • enhancement
  • interaction
  • video detection

Fingerprint

Dive into the research topics of 'Video object detection considering dynamic neighborhood feature multiplexing'. Together they form a unique fingerprint.

Cite this