date: 2024-12-31
title: CV-Project-Report
status: DONE
author:
  - AllenYGY
tags:
  - NOTE
  - CV
  - Project
publish: true

CV-Project-Report

Project Report

Reproducing Mip-3DGS: An Improved 3D Gaussian Splatting Framework

Abstract

In this project, our goal is to reproduce the results of Mip-3DGS, an improved version of the original 3D Gaussian Splatting (3DGS) framework. While 3DGS has proven to be a breakthrough in balancing high-quality rendering and real-time performance, Mip-3DGS introduces several key enhancements to further optimize efficiency and accuracy.

Our work validates the proposed improvements, including multi-resolution Gaussian representation and adaptive hierarchical density control, which allow the model to better handle fine-grained details while maintaining computational efficiency. Additionally, we analyze the effects of these modifications on rendering quality and performance through extensive experiments on benchmark datasets.

The results of our reproduction confirm that Mip-3DGS significantly outperforms the original 3DGS in terms of rendering speed and visual fidelity, especially for scenes with high geometric complexity. This report details our reproduction process, experimental findings, and insights gained from implementing the Mip-3DGS framework.

Problem Description and Importance

Rendering and representing 3D scenes efficiently and accurately is always a fundamental challenge in the computer vision field, especially for real-time applications such as gaming, virtual reality, and more. Traditional methods, such as voxel grids or mesh-based approaches, often fail to balance computational efficiency and visual fidelity. Even though Neural Radiance Fields (NeRF) presented a significant breakthrough in 3D representation, its high computational cost and slow rendering speed still make it unsuitable for real-time applications.

3D Gaussian Splatting (3DGS) emerged as a solution to these challenges, introducing a novel way to represent 3D scenes using Gaussian distributions. However, even with its advantages, 3DGS still has limitations when dealing with high-detail scenes or regions requiring multi-scale representations. Mip-3DGS addresses these issues by improving upon the original framework with multi-resolution Gaussian representations and adaptive density management.

Previous Work and Advancements

Limitations of NeRF:

Heavy Computational Cost: It requires large resources for training, which can take multiple hours for a single epoch.
Slow Rendering Speed: Not suitable for real-time applications.
Limited Scalability: Performs well on isolated objects or small-scale scenes, but struggles with large, unbounded scenes.

Limitation of 3D Gaussian Splatting:

3D Gaussian Splatting built on these challenges by replacing dense volumetric grids with anisotropic Gaussian distributions, allowing more efficient computations and real-time rendering capabilities. However, it lacked sufficient flexibility to handle multi-resolution details effectively.

Improvements in Mip-3DGS:

Hierarchical Gaussian Distributions: Handle scenes at varying scales more effectively.
Adaptive Density Control: Balance computational efficiency with scene detail.

Goals of the Project

The objective of this project is to reproduce the results of the Mip-3DGS framework and validate its claimed improvements over the original 3DGS. Specifically, we aim to:

Verify the accuracy and reliability of the multi-resolution Gaussian representation and its implementation.
Evaluate the performance improvements in rendering speed and visual fidelity across different scene complexities.
Compare the results of Mip-3DGS against the original 3DGS on benchmark datasets.
Assess how adaptive density control affects resource efficiency and rendering quality.

Methodology

Hypothesis

Mip-Splatting enhances 3D Gaussian Splatting (3DGS) by addressing:

High-frequency artifacts during zoom-in, using a 3D Smoothing Filter to constrain the frequency of Gaussian primitives.
Aliasing and brightness artifacts during zoom-out, with a 2D Mip Filter simulating physical imaging processes.

The hypothesis is that these improvements lead to higher rendering fidelity, consistent performance across scales, and better generalization to unseen resolutions compared to the original 3DGS.

Dataset

Source:
- Blender Dataset: A synthetic dataset containing scenes with known ground-truth camera poses and high-quality rendered images, commonly used for evaluating novel view synthesis methods.
- Mip-NeRF 360 Dataset: A more challenging dataset featuring unbounded 360° camera movement, complex scene geometry, and lighting conditions.
Structure:
- Blender Dataset: Includes multiple scenes rendered at resolutions from 512×512 to 1024×1024.
- Mip-NeRF 360: Includes indoor and outdoor scenes with high dynamic range.
Preprocessing:
- Normalized camera parameters (focal length, pose).
- Downsampled images to simulate different resolutions (e.g., full resolution, 1/2, 1/4, 1/8).
- Adjusted Gaussian primitive parameters (e.g., positions, covariance matrices) to reflect the input scene scale.

Method

The implementation builds on the 3D Gaussian Splatting framework and introduces two key enhancements: the 3D Smoothing Filter and the 2D Mip Filter.

1. Baseline Model (3DGS) Reproduction

Reproduced the original 3D Gaussian Splatting pipeline using publicly available implementations.
Verified its performance on Blender and Mip-NeRF 360 datasets to ensure reproducibility of results.

2. 3D Smoothing Filter

To constrain high-frequency artifacts during zoom-in, the 3D smoothing filter modifies each Gaussian's representation in 3D space:
Where:

: The original 3D Gaussian with center and covariance .
: A low-pass filter applied in 3D space.

Efficient convolution of Gaussians results in another Gaussian:
Where:

: The maximum sampling frequency of Gaussian , derived from Nyquist-Shannon sampling theory.
: Focal length of the camera.
: Depth of the Gaussian from the camera.
: Scalar hyperparameter to control the filter size.
: Identity matrix for isotropic filtering.

This filter ensures that the maximum frequency of each Gaussian does not exceed the Nyquist limit, preventing high-frequency artifacts.

3. 2D Mip Filter

To mitigate aliasing and brightness/dilation artifacts during zoom-out, the 2D Mip Filter replaces the 2D dilation operation used in 3DGS. It simulates the physical imaging process by approximating a box filter with a 2D Gaussian:
Where:

: The covariance matrix of the Gaussian projected into screen space.
: Filter size adjusted dynamically to match pixel dimensions.

This operation ensures proper anti-aliasing by adapting the Gaussian spread to the screen-space pixel size.

4. Optimization and Training

Loss Function:
Multi-view photometric loss is used to optimize the Gaussian parameters ():
Where and are the rendered and target images.
Training Strategy:
- Gaussian primitives are dynamically added and removed based on their contribution to the scene representation.
- Sampling rates () for each Gaussian are recomputed every 100 iterations to reflect changes in camera poses.

Experiment: Single-Scale Training and Multi-Scale Testing

The goal of this experiment was to evaluate the generalization capability of Mip-Splatting when trained on a single resolution and tested on multiple scales, simulating zoom-in and zoom-out scenarios. This setting highlights the ability of Mip-Splatting to handle out-of-distribution resolutions without explicit multi-scale supervision.

Setup

Dataset: Blender Dataset, with high-quality synthetic scenes.
Training Resolution: Fixed single resolution (full resolution, 1×).
Testing Resolutions: Evaluated at multiple downsampled scales: 1×, 1/2, 1/4, and 1/8.
Metrics:
- PSNR (Peak Signal-to-Noise Ratio): Measures pixel-level accuracy.
- SSIM (Structural Similarity Index): Evaluates perceptual quality.
- LPIPS (Learned Perceptual Image Patch Similarity): Quantifies perceptual similarity.

Methods compared

Mip-Splatting: Incorporating the 3D Smoothing Filter and 2D Mip Filter.
3DGS: Baseline Gaussian Splatting without multi-resolution mechanisms.
3DGS + EWA: Gaussian Splatting with Elliptical Weighted Average filtering.
Mip-NeRF: A state-of-the-art neural radiance field model.

Results

Quantitative Metrics:
Mip-Splatting outperformed all baselines, demonstrating better generalization to unseen resolutions.
- PSNR: Consistently higher across all test scales, showcasing its ability to preserve pixel-level accuracy during scale changes.
- SSIM: Maintained perceptual image quality, avoiding artifacts common in other methods.
- LPIPS: Lower values indicated better perceptual similarity with ground truth.

Method	PSNR (1×)	PSNR (1/2)	PSNR (1/4)	PSNR (1/8)
Mip-Splatting	34.5	34.2	33.8	33.2
3DGS	32.7	31.5	30.2	28.5
3DGS + EWA	33.1	32.8	32.0	30.5
Mip-NeRF	34.0	33.6	32.5	30.8

Key Findings

Zoom-In (Higher Scales):
- Mip-Splatting eliminated high-frequency artifacts like edge thinning observed in 3DGS.
- It achieved sharper renderings compared to EWA and Mip-NeRF, which exhibited slight oversmoothing.
Zoom-Out (Lower Scales):
- Avoided aliasing and brightness artifacts seen in 3DGS and EWA.
- Mip-Splatting's 2D Mip Filter ensured proper anti-aliasing for smaller screen-space projections.
Generalization:
- Mip-Splatting retained fidelity across all tested resolutions, proving its robustness in single-scale training scenarios.

Figure Results

Reproduce Results

GOe0Sf

9U1Npp

Limitations of Mip-Splatting

Approximation Errors in 2D Mip Filtering:

The 2D Mip Filter uses a Gaussian approximation for box filtering to maintain computational efficiency. While effective in most cases, this approximation can introduce inaccuracies, particularly for very small or large screen-space projections of Gaussians.

Impact: In extreme zoom-out scenarios, minor visual artifacts may occur due to imperfect pixel integration.

Increased Training Complexity:

The inclusion of the 3D Smoothing Filter adds computational overhead during training, as it requires calculating maximum sampling frequencies for each Gaussian based on camera parameters.

Impact: This increases training time and resource requirements, particularly for large-scale datasets or scenes with high complexity.

Scalability Challenges for Large-Scale Scenes:

The number of Gaussian primitives can grow significantly in highly complex or expansive scenes, leading to higher memory usage and reduced efficiency. This limits Mip-Splatting’s scalability for unbounded environments like large landscapes or dense urban areas.

Impact: GPU memory and computational demands can become a bottleneck in real-time applications involving large-scale scenes.

Future Work

To address the limitations of Mip-Splatting and further enhance its capabilities, the following future directions are proposed:

Improved Filtering Approximations:
- Develop adaptive or neural-based filtering techniques to replace the Gaussian approximation used in the 2D Mip Filter. This could reduce approximation errors and improve visual fidelity, particularly in extreme zoom-out scenarios.
- Explore learning-based solutions that dynamically adjust filter parameters based on scene content or resolution.
Scalability Enhancements:
- Investigate methods to compress Gaussian primitives, such as clustering similar Gaussians or using hierarchical representations to reduce the number of primitives required in large-scale scenes.
- Incorporate memory-efficient optimization techniques to better handle expansive environments, ensuring consistent rendering quality without exceeding GPU memory constraints.
Dynamic Scene Support:
- Extend Mip-Splatting to support dynamic and real-time applications, such as scenes with moving objects or changing lighting conditions. Temporal consistency mechanisms, like motion-aware Gaussian updates, could help adapt to dynamic environments.
- Integrate temporal filtering or real-time optimization to ensure smooth transitions between frames in video or VR applications.
Efficient Training Mechanisms:
- Develop more efficient algorithms to calculate sampling frequencies (( \nu_k )) and optimize Gaussian parameters during training, potentially leveraging parallelism or GPU-specific acceleration.
- Investigate pre-training or transfer learning approaches to reduce the time required for scene-specific optimization.
Integration with Neural Representations:
- Combine Mip-Splatting with neural scene representations, such as Neural Radiance Fields (NeRF), to take advantage of the best of both approaches. For instance, use Mip-Splatting for real-time rendering and NeRF for high-quality offline rendering.
Real-Time Optimization for Large Scenes:
- Focus on real-time adaptive techniques that can dynamically balance rendering quality and computational cost based on hardware capabilities and scene complexity.
- Explore GPU-accelerated rendering pipelines tailored for Mip-Splatting to meet latency requirements in VR or AR scenarios.

Conclusion

Mip-Splatting introduces significant advancements over traditional 3D Gaussian Splatting (3DGS) by addressing critical issues such as high-frequency artifacts during zoom-in and aliasing during zoom-out. By integrating a 3D Smoothing Filter and 2D Mip Filter, the framework enables higher rendering fidelity and consistent performance across multiple resolutions. These improvements make Mip-Splatting more robust and adaptable for real-time rendering and novel view synthesis tasks.

Through the reproduction of the Mip-Splatting framework in this project, we verified its claimed enhancements in visual fidelity and computational efficiency. Experimental results demonstrated its ability to handle multi-scale scenes effectively while maintaining high-quality renderings. However, challenges remain, including scalability for large scenes, training complexity, and dynamic scene adaptability.

Future work can build upon this foundation by addressing scalability and real-time performance limitations, exploring dynamic scene support, and integrating neural representations for further improvements. Overall, Mip-Splatting represents a promising direction for efficient, real-time 3D scene reconstruction and rendering, bridging the gap between quality and speed in novel view synthesis applications.