A Fast Volumetric Capture and Reconstruction Pipeline for Dynamic Point Clouds and Gaussian Splats

1 École Polytechnique Fédérale de Lausanne (EPFL)
2 Lucerne University of Applied Sciences and Arts (HSLU)

CVMP 2025
Our pipeline captures human performances with 6–12 RGB-D or RGB-only cameras and reconstructs dynamic point clouds and Gaussian splats in real time for on-location preview and standards-compliant export.

Abstract

We present a fast and efficient volumetric capture and reconstruction system that processes either RGB-D or RGB-only input to generate 3D representations in the form of point clouds and Gaussian splats. For Gaussian splat reconstructions, we took the GPS-Gaussian regressor and improved it, enabling high-quality reconstructions with minimal overhead. The system is designed for easy setup and deployment, supporting in-the-wild operation under uncontrolled illumination and arbitrary backgrounds, as well as flexible camera configurations, including sparse setups, arbitrary camera numbers and baselines. Captured data can be exported in standard formats such as PLY, MPEG V-PCC, and SPLAT, and visualized through a web-based viewer or Unity/Unreal plugins. A live on-location preview of both input and reconstruction is available at 5–10 FPS. We present qualitative findings focused on deployability and targeted ablations. The complete framework is open-source, facilitating reproducibility and further research.

Pipeline

Left: capture in action. Right: pipeline overview (data flow from left to right). Per-camera RGB-D/RGB inputs are processed in parallel; vertical bars indicate synchronization points. Segmentation and depth processing run in parallel, then point-cloud and Gaussian-splat reconstructions are computed in parallel. A live monitor shows reconstruction previews at 5 to 10 FPS, cycling across input cameras; after processing, teaser clips and reconstructions are served in the web viewer.

RGB Processing

Color cues for two sample image pairs. From left to right: color, mask, optical flow, disparity from RAFT-Stereo [Lipson et al. 2021] and from FoundationStereo [Wen et al. 2025]. RAFT-Stereo was trained on human data only, predicting more accurate disparity ranges.

Depth Processing

Sensor depth (left block): raw, spatially filtered (BS), and spatio-temporally filtered (BS+T). Stereo-estimated depth (right block) is computed from rectified pairs and shown without bilateral filtering.

Video Reconstruction Samples

Interactive Reconstruction Viewer

BibTeX

@inproceedings{10.1145/3756863.3769713,
    author = {Charisoudis, Athanasios and Croci, Simone and Lam, Kit Yung and Frossard, Pascal and Smolic, Aljosa},
    title = {A Fast Volumetric Capture and Reconstruction Pipeline for Dynamic Point Clouds and Gaussian Splats},
    year = {2025},
    isbn = {9798400721175},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3756863.3769713},
    doi = {10.1145/3756863.3769713},
    abstract = {We present a fast and efficient volumetric capture and reconstruction system that processes either RGB-D or RGB-only input to generate 3D representations in the form of point clouds and Gaussian splats. For Gaussian splat reconstructions, we took the GPS-Gaussian regressor and improved it, enabling high-quality reconstructions with minimal overhead. The system is designed for easy setup and deployment, supporting in-the-wild operation under uncontrolled illumination and arbitrary backgrounds, as well as flexible camera configurations, including sparse setups, arbitrary camera numbers and baselines. Captured data can be exported in standard formats such as PLY, MPEG V-PCC, and SPLAT, and visualized through a web-based viewer or Unity/Unreal plugins. A live on-location preview of both input and reconstruction is available at 5–10 FPS. We present qualitative findings focused on deployability and targeted ablations. The complete framework is open-source, facilitating reproducibility and further research.},
    booktitle = {Proceedings of the 22nd ACM SIGGRAPH European Conference on Visual Media Production},
    articleno = {9},
    numpages = {11},
    keywords = {Volumetric video capture, point clouds, Gaussian splats, dynamic reconstruction},
    series = {CVMP '25}
}