This work was done while Ashkan Taghipour was a research intern at Dolby Laboratories, Sydney, Australia.
Accepted at ICRA 2026
3D Gaussian Splatting (3DGS) enables fast, highquality novel view synthesis but relies on densification followed by pruning to optimize the number of Gaussians. Existing mask-based pruning, such as MaskGS, regularizes the global mean of the mask, which is misaligned with the local perpixel (per-ray) reconstruction loss that determines image quality along individual camera rays. This paper introduces SVR-GS, a spatially variant regularizer that renders a per-pixel spatial mask from each Gaussian’s effective contribution along the ray, thereby applying sparsity pressure where it matters: on lowimportance Gaussians. We explore three spatial-mask aggregation strategies, implement them in CUDA, and conduct a gradient analysis to motivate our final design. Extensive experiments on Tanks&Temples, Deep Blending, and Mip-NeRF360 datasets demonstrate that, on average across the three datasets, the proposed SVR-GS reduces the number of Gaussians by 1.79× compared to MaskGS and 5.63× compared to 3DGS, while incurring only 0.50 dB and 0.40 dB PSNR drops, respectively. These gains translate into significantly smaller, faster, and more memory-efficient models, making them well-suited for real-time applications such as robotics, AR/VR, and mobile perception.
Designing the forward mask aggregation is critical because it determines the gradient signal propagated by the loss. Below we present the backward derivations for two alternative designs that were implemented in CUDA and compared against the proposed forward (Eq. 5 in the paper).
We consider a front-to-back ray-ordered list of Gaussians indexed by $i = 0, \dots, N{-}1$, with masks $M_i \in [0,1]$ and opacities $\alpha_i \in [0,1]$. Transmittance is defined as:
$$T_0=1,\quad T_{i+1}=(1-\alpha_i M_i)\,T_i.$$For any $j > i$, the partial derivative of transmittance w.r.t. $M_i$ is:
$$\frac{\partial T_j}{\partial M_i} = -\frac{\alpha_i\, T_j}{1-\alpha_i M_i},\qquad \frac{\partial T_j}{\partial M_i}=0 \;\text{ for } j \le i.$$Each Gaussian is weighted by the inverse of its importance $\alpha_i T_i$, so Gaussians with small contributions receive larger weight.
Forward:
$$w_i = \frac{1}{\alpha_i T_i + \varepsilon},\qquad F_A = \frac{\sum_{k} w_k M_k}{\sum_{k} w_k}.$$With shorthands $S := \sum_k w_k$ for the denominator.
Derivative of the weights:
For $j = i$: $w_i$ does not depend on $M_i$, so $\partial w_i / \partial M_i = 0$. For $j > i$:
$$\frac{\partial w_j}{\partial M_i} = \frac{\alpha_i \,\alpha_j\, T_j}{(1-\alpha_i M_i)\,(\alpha_j T_j+\varepsilon)^2}.$$Final gradient (quotient rule on $F_A = \mathrm{Num}/S$):
$$\frac{\partial F_A}{\partial M_i} = \frac{1}{S}\left[w_i + \sum_{j>i}(M_j - F_A)\,\frac{\partial w_j}{\partial M_i}\right].$$Scenario A falls short because the inverse-importance weights $w_i = 1/(\alpha_i T_i + \varepsilon)$ explode when $\alpha_i T_i$ is tiny, saturating $F_A$ and tagging broad regions as low-importance instead of isolating truly redundant Gaussians.
Each Gaussian is suppressed by the cumulative occlusion in front of it.
Forward:
$$f_i := M_i(1 - T_i),\qquad F_B = \frac{1}{\log(1+N)}\sum_{k=0}^{N-1} f_k.$$Term-wise derivatives:
For $j = i$: $T_i$ is independent of $M_i$, so
$$\frac{\partial f_i}{\partial M_i} = 1 \cdot (1 - T_i).$$For $j > i$: $M_j$ is independent of $M_i$, but $T_j$ is not:
$$\frac{\partial f_j}{\partial M_i} = \frac{\alpha_i}{1-\alpha_i M_i}\,M_j\,T_j.$$Final gradient:
$$\frac{\partial F_B}{\partial M_i} = \frac{1}{\log(1+N)}\left[(1 - T_i) + \frac{\alpha_i}{1-\alpha_i M_i}\sum_{j>i} M_j\,T_j\right].$$Scenario B falls short because the cumulative term $(1 - T_i)$ depends only on occlusion, so the mask primarily tracks ray depth rather than per-Gaussian importance, over-penalizing long rays and occluded regions.