HRM2Avatar

`HRM2Avatar`: High-Fidelity Real-Time Mobile Avatars from Monocular Phone Scans
#

Chao Shi, Shenghao Jia, Jinhui Liu, Yong Zhang, Liangchao Zhu, Zhonglei Yang, Jinze Ma, Chaoyue Niu, Chengfei Lv

Splats

SMPL-X

Clothing

Texture

Monocular

SIGGRAPH Asia 2025

null

Abstract
#

We present HRM2Avatar, a novel framework for creating high-fidelity avatars from monocular phone scans, which can be rendered and animated in real-time on mobile devices. Monocular capture with commodity smartphones provides a low-cost, pervasive alternative to studio-grade multi-camera rigs, making avatar digitization accessible to non-expert users. Reconstructing high-fidelity avatars from single-view video sequences poses significant challenges due to deficient visual and geometric data relative to multi-camera setups. To address these limitations, at the data level, our method leverages two types of data captured with smartphones: static pose sequences for detailed texture reconstruction and dynamic motion sequences for learning pose-dependent deformations and lighting changes. At the representation level, we employ a lightweight yet expressive representation to reconstruct high-fidelity digital humans from sparse monocular data. First, we extract explicit garment meshes from monocular data to model clothing deformations more effectively. Second, we attach illumination-aware Gaussians to the mesh surface, enabling high-fidelity rendering and capturing pose-dependent lighting changes. This representation efficiently learns high-resolution and dynamic information from our tailored monocular data, enabling the creation of detailed avatars. At the rendering level, real-time performance is critical for rendering and animating high-fidelity avatars in AR/VR, social gaming, and on-device creation, demanding sub-frame responsiveness. Our fully GPU-driven rendering pipeline delivers 120 FPS on mobile devices and 90 FPS on standalone VR devices at 2K resolution, over 2.7× faster than representative mobile-engine baselines. Experiments show that HRM2Avatar delivers superior visual realism and real-time interactivity at high resolutions, outperforming state-of-the-art monocular methods.

Paper