HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting

We have recently seen tremendous progress in photo-real human modeling and rendering. Yet, efficiently rendering realistic human performance and integrating it into the rasterization pipeline remains challenging. In this paper, we present HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Our core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation. We first propose a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints. Then, we utilize a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating. We also present a companion compression scheme with residual compensation for immersive experiences on various platforms. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of our approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead.

Pipeline

The keyframe based non-rigid tracking establishes a coarse deformation graph and tracks the motions for Gaussian optimization. Subsequently, HiFi4G initializes first frame Gaussians from NeuS2 and constructs a fine-grained Gaussian graph to enhance temporal coherence. We then employ the ED graph to warp 4D Gaussians, applying both E_smooth and E_temp constraints to the Gaussian graph, which yields spatial-temporally compact and compression-friendly 4D Gaussians, thus facilitating efficient compression.

Result

Here are our rendering results. HiFi4G delivers real-time high-fidelity rendering of human performance across challenging motions, such as playing instruments, dancing and changing clothes.

Acknowledgements

The authors would like to thank Zitong Hu, Shang Zhang, and Xi Chen from ShanghaiTech University for processing the dataset. We are grateful to Hao Liu for insightful discussions. We also thank the reviewers for their feedback. This work was supported by National Key R&D Program of China (2022YFF0902301), Shanghai Local college capacity building program (22010502800). We also acknowledge support from Shanghai Frontiers Science Center of Human-centered Artificial Intelligence (ShangHAI).

Bibtex

@InProceedings{Jiang_2024_CVPR, author = {Jiang, Yuheng and Shen, Zhehao and Wang, Penghao and Su, Zhuo and Hong, Yu and Zhang, Yingliang and Yu, Jingyi and Xu, Lan}, title = {HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {19734-19745} }