Recent works in volume rendering, \textit{e.g.} NeRF and 3D Gaussian Splatting (3DGS), significantly advance the rendering quality and efficiency with the help of the learned implicit neural radiance field or 3D Gaussians. Rendering on top of an explicit representation, the vanilla 3DGS and its variants deliver real-time efficiency by optimizing the parametric model with single-view supervision per iteration during training which is adopted from NeRF. Consequently, certain views are overfitted, leading to unsatisfying appearance in novel-view synthesis and imprecise 3D geometries. To solve aforementioned problems, we propose a new 3DGS optimization method embodying four key novel contributions: 1) We transform the conventional single-view training paradigm into a multi-view training strategy. With our proposed multi-view regulation, 3D Gaussian attributes are further optimized without overfitting certain training views. As a general solution, we improve the overall accuracy in a variety of scenarios and different Gaussian variants. 2) Inspired by the benefit introduced by additional views, we further propose a cross-intrinsic guidance scheme, leading to a coarse-to-fine training procedure concerning different resolutions. 3) Built on top of our multi-view regulated training, we further propose a cross-ray densification strategy, densifying more Gaussian kernels in the ray-intersect regions from a selection of views. 4) By further investigating the densification strategy, we found that the effect of densification should be enhanced when certain views are distinct dramatically. As a solution, we propose a novel multi-view augmented densification strategy, where 3D Gaussians are encouraged to get densified to a sufficient number accordingly, resulting in improved reconstruction accuracy. We conduct extensive experiments to demonstrate that our proposed method is capable of improving novel view synthesis of the Gaussian-based explicit representation methods about 1 dB PSNR for various tasks.
Pipeline of MVGS. We propose to incorporate multiple training views per training iteration by multi-view regulated learning. It forces the whole 3D Gaussians to learn the structure and appearance of multiple views jointly without suffering overfitting issues met in learning from a single view. It enables 3DGS to be constrained to the whole scene and less overfitting to certain views. To incorporate more multi-view information, we propose a cross-intrinsic guidance strategy to optimize 3DGS from low resolution to high resolution. The low-resolution training allows plenty of multi-view information as a powerful constraint to build more compact 3D Gaussians. It also conveys learned scene structure for high-resolution training to sculpt finer detail. To foster the learning of multi-view information, we further propose a cross-ray densification strategy, utilizing the ray marching technology with the guidance of the 2D loss maps to guide densification. The 3D Gaussians in overlapped 3D regions of cross rays would be densified to improve reconstruction performance for these views since these 3D Gaussians jointly serve and play an important role in the rendering of these views. In addition, we propose a multi-view augmented densification strategy when discrepancies between perspectives are significant. This approach encourages 3D Gaussians to densify more primitives, enabling better fitting across various perspectives and improving overall NVS performance.
Here, we present the qualitative comparison results of 3DGS, MVGS (Ours), and Ground Truth.
@misc{du2024mvgsmultiviewregulatedgaussiansplatting,
title={MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis},
author={Xiaobiao Du and Yida Wang and Xin Yu},
year={2024},
eprint={2410.02103},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.02103},
}