Abstract
Point cloud upsampling is a critical task in 3D computer vision, aiming to generate dense and uniformly distributed point sets from sparse inputs. While current self-supervised methods show promise, they often struggle with preserving fine-grained geometric details, especially for highly sparse point clouds. To address these limitations, we propose PointUpsampleLLM (PULLM), a novel multi-modal framework that leverages the power of large language models (LLMs) to enhance 3D point cloud upsampling. PULLM integrates a pretrained Point Cloud LLM (PointLLM) with visual features extracted from point clouds, learning a unified representation that captures both geometric and semantic information. At the core of our approach is the Feature Aware Translator (FAT) module, which effectively bridges the modality gap between visualand textual features, enhancing the spatial understanding of the LLM. PULLM generates textual descriptions of point clouds on-the-fly, eliminating the need for large paired datasets. Extensive experiments on the PU1K and PUGAN benchmarks demonstrate that PULLM consistently outperforms state-of-the-art methods, achieving significant improvements in Chamfer Distance, Hausdorff Distance, and Point-to-Plane distance metrics. For instance, on the PUGAN dataset with sparse inputs, PULLM achieves a 56.15% improvement in Chamfer Distance over the best baseline. Our qualitative results further illustrate PULLM’s superior ability to preserve fine details and generate high-quality upsampled point clouds across various object types and geometries.
Original language | English |
---|---|
Title of host publication | Proceedings of the 40th ACM/SIGAPP Symposium On Applied Computing |
Publisher | Association for Computing Machinery |
Publication status | Accepted for publication - 9 Jan 2025 |
Event | 40th ACM/SIGAPP Symposium On Applied Computing - Catania, Sicily, Italy Duration: 31 Mar 2025 → 4 Apr 2025 |
Conference
Conference | 40th ACM/SIGAPP Symposium On Applied Computing |
---|---|
Country/Territory | Italy |
City | Catania, Sicily |
Period | 31/03/25 → 4/04/25 |
Keywords
- Point Cloud Upsampling
- Large Language Models (LLMs)
- Multi- modal Learning
- Feature Aware Translator (FAT)
- 3D Computer Vision