Spatial cognition-driven deep learning for car detection in unmanned aerial vehicle imagery

Jiahui Yu, Hongwei Gao, Sun Jian, Dalin Zhou, Zhaojie Ju

Research output: Contribution to journalArticlepeer-review

86 Downloads (Pure)


Small object detection is the main challenge for image detection of unmanned aerial vehicles (UAVs), especially with small pixel ratios and blurred boundaries. In this paper, a one-stage detector (SF-SSD) is proposed with a new spatial cognition algorithm. The deconvolution operation is introduced to a feature fusion module, which enhances the representation of shallow features. These more representative features prove effective for small-scale object detection. Empowered by a spatial cognition method, the deep model can re-detect objects with less-reliable confidence scores. This enables the detector to improve detection accuracy significantly. Both between-class similarity and within-class similarity are fully exploited to suppress useless background information. This motivates the proposed model to take full use of semantic features in the detection process of multi-class small objects. A simplified network structure can improve the speed of object detection. The experiments are conducted on a newly collected dataset (SY-UAV) and the benchmark datasets (CARPK and PUCPR+). To further demonstrate the effectiveness of the spatial cognition module, a multi-class object detection experiment is conducted on the Stanford Drone dataset (SDD). The results show that the proposed model achieves high frame rates and better detection accuracies than the state-of-the-art methods, which are 90.1% (CAPPK), 90.8% (PUCPR+), and 91.2% (SDD).
Original languageEnglish
JournalIEEE Transactions on Cognitive and Developmental Systems
Early online date2 Nov 2021
Publication statusEarly online - 2 Nov 2021


  • UAV imagery
  • SSD
  • Feature fusion
  • Small object detection
  • deep learning


Dive into the research topics of 'Spatial cognition-driven deep learning for car detection in unmanned aerial vehicle imagery'. Together they form a unique fingerprint.

Cite this