SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting

SVR-GS: Spatially Variant Regularization for Probabilistic Masks in
3D Gaussian Splatting

Ashkan Taghipour¹, Vahid Naghshin², Benjamin Southwell², Farid Boussaid³, Hamid Laga, Mohammed Bennamoun¹

¹Department of Computer Science and Software Engineering, The University of Western Australia, Australia.
²Dolby Laboratories, Sydney, Australia.
³Department of Electrical, Electronics and Computer Engineering, The University of Western Australia, Australia.

This work was done while Ashkan Taghipour was a research intern at Dolby Laboratories, Sydney, Australia.

Accepted at ICRA 2026

arXiv Video Summary

Abstract

3D Gaussian Splatting (3DGS) enables fast, highquality novel view synthesis but relies on densification followed by pruning to optimize the number of Gaussians. Existing mask-based pruning, such as MaskGS, regularizes the global mean of the mask, which is misaligned with the local perpixel (per-ray) reconstruction loss that determines image quality along individual camera rays. This paper introduces SVR-GS, a spatially variant regularizer that renders a per-pixel spatial mask from each Gaussian’s effective contribution along the ray, thereby applying sparsity pressure where it matters: on lowimportance Gaussians. We explore three spatial-mask aggregation strategies, implement them in CUDA, and conduct a gradient analysis to motivate our final design. Extensive experiments on Tanks&Temples, Deep Blending, and Mip-NeRF360 datasets demonstrate that, on average across the three datasets, the proposed SVR-GS reduces the number of Gaussians by 1.79× compared to MaskGS and 5.63× compared to 3DGS, while incurring only 0.50 dB and 0.40 dB PSNR drops, respectively. These gains translate into significantly smaller, faster, and more memory-efficient models, making them well-suited for real-time applications such as robotics, AR/VR, and mobile perception.

Image/Splats/Points

Rendered Image

Rendered image from an arbitrary novel view.

Overlaid Gaussians

Each ellipse is the image-space footprint of a projected 3D Gaussian (projected covariance contour). At full resolution, many small ellipses appear on fine structures (e.g., leaves, table surface).

Overlaid Points

Blue points mark the projected means (image-plane centers) of the 3D Gaussians. Denser clusters indicate higher Gaussian population in those regions.

Ablation Analysis of Two Alternative Forward Mask Designs

Designing the forward mask aggregation is critical because it determines the gradient signal propagated by the loss. Below we present the backward derivations for two alternative designs that were implemented in CUDA and compared against the proposed forward (Eq. 5 in the paper).

Shared Identity

We consider a front-to-back ray-ordered list of Gaussians indexed by $i = 0, \dots, N{-}1$, with masks $M_i \in [0,1]$ and opacities $\alpha_i \in [0,1]$. Transmittance is defined as:

$$T_0=1,\quad T_{i+1}=(1-\alpha_i M_i)\,T_i.$$

For any $j > i$, the partial derivative of transmittance w.r.t. $M_i$ is:

$$\frac{\partial T_j}{\partial M_i} = -\frac{\alpha_i\, T_j}{1-\alpha_i M_i},\qquad \frac{\partial T_j}{\partial M_i}=0 \;\text{ for } j \le i.$$

Scenario A — Inverse-Importance Weighting

Each Gaussian is weighted by the inverse of its importance $\alpha_i T_i$, so Gaussians with small contributions receive larger weight.

Forward:

$$w_i = \frac{1}{\alpha_i T_i + \varepsilon},\qquad F_A = \frac{\sum_{k} w_k M_k}{\sum_{k} w_k}.$$

With shorthands $S := \sum_k w_k$ for the denominator.

Derivative of the weights:

For $j = i$: $w_i$ does not depend on $M_i$, so $\partial w_i / \partial M_i = 0$. For $j > i$:

$$\frac{\partial w_j}{\partial M_i} = \frac{\alpha_i \,\alpha_j\, T_j}{(1-\alpha_i M_i)\,(\alpha_j T_j+\varepsilon)^2}.$$

Final gradient (quotient rule on $F_A = \mathrm{Num}/S$):

$$\frac{\partial F_A}{\partial M_i} = \frac{1}{S}\left[w_i + \sum_{j>i}(M_j - F_A)\,\frac{\partial w_j}{\partial M_i}\right].$$

Scenario A falls short because the inverse-importance weights $w_i = 1/(\alpha_i T_i + \varepsilon)$ explode when $\alpha_i T_i$ is tiny, saturating $F_A$ and tagging broad regions as low-importance instead of isolating truly redundant Gaussians.

Scenario B — Cumulative-Transmittance Masking

Each Gaussian is suppressed by the cumulative occlusion in front of it.

Forward:

$$f_i := M_i(1 - T_i),\qquad F_B = \frac{1}{\log(1+N)}\sum_{k=0}^{N-1} f_k.$$

Term-wise derivatives:

For $j = i$: $T_i$ is independent of $M_i$, so

$$\frac{\partial f_i}{\partial M_i} = 1 \cdot (1 - T_i).$$

For $j > i$: $M_j$ is independent of $M_i$, but $T_j$ is not:

$$\frac{\partial f_j}{\partial M_i} = \frac{\alpha_i}{1-\alpha_i M_i}\,M_j\,T_j.$$

Final gradient:

$$\frac{\partial F_B}{\partial M_i} = \frac{1}{\log(1+N)}\left[(1 - T_i) + \frac{\alpha_i}{1-\alpha_i M_i}\sum_{j>i} M_j\,T_j\right].$$

Scenario B falls short because the cumulative term $(1 - T_i)$ depends only on occlusion, so the mask primarily tracks ray depth rather than per-Gaussian importance, over-penalizing long rays and occluded regions.