MVPGS: Excavating Multi-view Prior for Gaussian Splatting from Sparse Input Views


ECCV 2024


Wangze Xu1, Huachen Gao1, Shihe Shen1, Rui Peng1, Jianbo Jiao2, Ronggang Wang1,3,
1School of Electronic and Computer Engineering, Peking University, 2School of Computer Science, University of Birmingham, 3Peng Cheng Laboratory

Abstract

Recently, the advancement of the Neural Radiance Field (NeRF) has facilitated few-shot Novel View Synthesis (NVS), which presents a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose MVPGS, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, to facilitate proper convergence of optimization, we introduce a view-consistent geometry constraint for Gaussian parameters and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed.

Method

The proposed method leverages learning-based MVS to estimate dense view-consistent depth and construct a point cloud for the initialization of 3D Gaussians. We excavate the computed geometry from MVS through forward warping to generate appearance priors for the supervision of unseen views. To regularize the geometry update during optimization, we introduce geometric losses from MVS depth and monocular priors to guide 3D Gaussians to converge to proper positions.

Comparison

Compared with NeRF, the proposed method maintains high-fidelity quality in high-frequency regions and meanwhile achieves competitive training and rendering speed for few-shot NVS.

DTU (3 input views)

LLFF (3 input views)

NVS-RGBD (3 input views)

T&T (3 input views)

BibTeX

      
          @inproceedings{xwz2024eccv,
          title={MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views},
          author={Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang},
          booktitle={The 18th European Conference on Computer Vision (ECCV)},
          year={2024}}