GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting

1The University of Tokyo    2RIKEN    3South China University of Technology   
4Shenzhen Institute of Advanced Technology, Chinese Academy of Science   
*Equal contribution    Corresponding author   

Demo

3D Occupancy Prediction and Render Depth, Left-> nuScenes, Right -> DDAD

overview_image

Abstract

We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering).

overview_image

Method

The overview of the proposed GaussianOcc. With a sequence of surround images, we employ a U-Net architecture to predict Gaussian attributes in the 2D image grid space for cross-view Gaussian splatting. This approach provides scale information in the joint training with the 6D pose network (Stage 1). For 3D occupancy estimation, we lift the 2D features to a 3D voxel space and propose voxel grid Gaussian splatting for fast rendering (Stage 2). We omit the line to from the 6D pose net to the loss in stage 1 for clarity.

overview_image

Main Results

3D Occupancy Estimation

We experimented in compared with the mIoU and IoU metrics and showed the render speed comparison.
3docc-prediction-results
3docc-prediction-results
occupancy-prediction-results

Self-supervised Multi-camera Depth Estimation

We conduct depth estimation compared with all the methods in scale-aware metric.
occupancy-prediction-results

Image shows

The image below shows some images from the nuScenes dataset.
depth-estimation-qual-results
The image below shows some images from the DDAD dataset.
depth-estimation-qual-results

BibTeX


@article{gan2024gaussianocc,
title={GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting},
author={Gan, Wanshui and Liu, Fang and Xu, Hongbin and Mo, Ningkai and Yokoya, Naoto},
journal={arXiv preprint arXiv:2408.11447},
year={2024}
}