V3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians


SIGGRAPH Asia 2024 (TOG)

Penghao Wang*, Zhirui Zhang*, Liao Wang*, Kaixin Yao, Siyuan Xie, Jingyi Yu†, Minye Wu†, Lan Xu†

Paper Video Code (Training) Code (SIBR Viewer) Code (IOS Viewer)

Code Released


We present V^3, a novel method that can stream 2D Gridded Gaussians to mobiles for high-quality rendering with low storage requirements, providing users with a unique volumetric video viewing experience across multiple devices.

Overview Video


Representation



We model dynamic 3DGS as a 2D video with multiple dimensions, where each frame corresponds to its specific 3DGS attributes. During the rendering, we extract Gaussian properties from each pixel to recover Gaussian Splat structural.

Method



First, we divide the long sequences into groups for training. In the first stage, we use hash encoding following a shallow MLP with position as input to estimate the motion of the human subjects. In the second stage, we fine-tune the attributes of the wrapped Gaussians from stage 1 with residual entropy loss and temporal loss, which yields 2D Gaussian video with highly temporal consistency and thus can use video codec to perform efficient compression.

Comparison



Qualitative Comparison against recent SOTA methods including VideoRF [Wang et al. 2024], 3DGStream [Sun et al. 2024], HumanRF [Işık et al. 2023],NeuS2 [Wang et al. 2023a]. Our method achieves high-quality rendering with clear details.

Result Gallery



Gallery of our results. Our method can achieve high-quality novel view synthesis in scenes with challenging motion and flexible topology changes.

Bibtex


@article{wang2024v, title={V\^{} 3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians}, author={Wang, Penghao and Zhang, Zhirui and Wang, Liao and Yao, Kaixin and Xie, Siyuan and Yu, Jingyi and Wu, Minye and Xu, Lan}, journal={ACM Transactions on Graphics (TOG)}, volume={43}, number={6}, pages={1--13}, year={2024}, publisher={ACM New York, NY, USA} }