PULLM: A multimodal framework for enhanced 3D point cloud upsampling using large language models

Zhiyong Zhang, Ruyu Liu, Xiufeng Liu, Yunrui Zhu, Linda Yang, Chaochao Wang, Jianhua Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Point cloud upsampling is a critical task in 3D computer vision, aiming to generate dense and uniformly distributed point sets from sparse inputs. While current self-supervised methods show promise, they often struggle with preserving fine-grained geometric details, especially for highly sparse point clouds. To address these limitations, we propose PointUpsampleLLM (PULLM), a novel multi-modal framework that leverages the power of large language models (LLMs) to enhance 3D point cloud upsampling. PULLM integrates a pretrained Point Cloud LLM (PointLLM) with visual features extracted from point clouds, learning a unified representation that captures both geometric and semantic information. At the core of our approach is the Feature Aware Translator (FAT) module, which effectively bridges the modality gap between visualand textual features, enhancing the spatial understanding of the LLM. PULLM generates textual descriptions of point clouds on-the-fly, eliminating the need for large paired datasets. Extensive experiments on the PU1K and PUGAN benchmarks demonstrate that PULLM consistently outperforms state-of-the-art methods, achieving significant improvements in Chamfer Distance, Hausdorff Distance, and Point-to-Plane distance metrics. For instance, on the PUGAN dataset with sparse inputs, PULLM achieves a 56.15% improvement in Chamfer Distance over the best baseline. Our qualitative results further illustrate PULLM’s superior ability to preserve fine details and generate high-quality upsampled point clouds across various object types and geometries.
Original languageEnglish
Title of host publicationProceedings of the 40th ACM/SIGAPP Symposium On Applied Computing
PublisherAssociation for Computing Machinery
Publication statusAccepted for publication - 9 Jan 2025
Event40th ACM/SIGAPP Symposium On Applied Computing - Catania, Sicily, Italy
Duration: 31 Mar 20254 Apr 2025

Conference

Conference40th ACM/SIGAPP Symposium On Applied Computing
Country/TerritoryItaly
CityCatania, Sicily
Period31/03/254/04/25

Keywords

  • Point Cloud Upsampling
  • Large Language Models (LLMs)
  • Multi- modal Learning
  • Feature Aware Translator (FAT)
  • 3D Computer Vision

Fingerprint

Dive into the research topics of 'PULLM: A multimodal framework for enhanced 3D point cloud upsampling using large language models'. Together they form a unique fingerprint.

Cite this