Abstract
Varying density of point clouds increases the difficulty of 3D detection. In this paper, we present a context-aware dynamic network (CADNet) to capture the variance of density by considering both point context and semantic context. Point-level contexts are generated from original point clouds to enlarge the effective receptive filed. They are extracted around the voxelized pillars based on our extended voxelization method and processed with the context encoder in parallel with the pillar features. With a large perception range, we are able to capture the variance of features for potential objects and generate attentive spatial guidance to help adjust the strengths for different regions. In the region proposal network, considering the limited representation ability of traditional convolution where same kernels are shared among different samples and positions, we propose a decomposable dynamic convolutional layer to adapt to the variance of input features by learning from the local semantic context. It adaptively generates the position-dependent coefficients for multiple fixed kernels and combines them to convolve with local features. Based on our dynamic convolution, we design a dual-path convolution block to further improve the representation ability. We conduct experiments on KITTI dataset and the proposed CADNet has achieved superior performance of 3D detection outperforming SECOND and PointPillars by a large margin at the speed of 30 FPS.
Original language | English |
---|---|
Number of pages | 13 |
Journal | IEEE Transactions on Intelligent Transportation Systems |
Early online date | 16 Jul 2021 |
DOIs | |
Publication status | Early online - 16 Jul 2021 |
Keywords
- three-dimensional displays
- feature extraction
- convolution
- proposals
- kernel
- laser radar
- semantics