Despite the recent advances in the so-called "cold start" generation from text prompts, their needs in data and computing resources, as well as the ambiguities around intellectual property and privacy concerns pose certain counterarguments for their utility. An interesting and relatively unexplored alternative has been the introduction of unconditional synthesis from a single sample, which has led to interesting generative applications.
In this paper we focus on single-shot motion generation and more specifically on accelerating the training time of a Generative Adversarial Network (GAN). In particular, we tackle the challenge of GAN's equilibrium collapse when using mini-batch training by carefully annealing the weights of the loss functions that prevent mode collapse. Additionally, we perform statistical analysis in the generator and discriminator models to identify correlations between training stages and enable transfer learning. Our improved GAN achieves competitive quality and diversity on the Mixamo benchmark when compared to the original GAN architecture and a single-shot diffusion model, while being up to \(\times 6.8\) faster in training time from the former and \(\times 1.75\) from the latter.
Finally, we demonstrate the ability of our improved GAN to mix and compose motion with a single forward pass.
We provide some indicative applications of our single-shot GAN that do not need any re-training.
We use our single-shot GAN trained on the "breakdance freezes" sequence from the Mixamo dataset to generate variations of the "breakdance freezes" sample by sampling different codes from a Gaussian distribution. For visualization purposes we use the "Michelle" character provided by Mixamo.
We use the Mixamo motion "swing dancing" as an example input sequence, keeping the lower-body unaltered (fixed) and generating 7 alternative - but natural - versions of the upper-body. The displayed result is rendering using the "Jackie" character provided by Mixamo.
Changing to the "salsa dancing" input sequence, we now keep the upper-body fixed and sample partial codes for 7 alternative versions of the lower-body. The "Michelle" character is used for visualization.
@inproceedings{roditakis2024singleshot,
author = {Roditakis, Konstantinos, and Thermos, Spyridon, and Zioulis, Nikolaos},
title = {Towards Practical Single-Shot Motion Synthesis},
booktitle = {IEEE/CVF Computer Vision and Pattern Recognition (CVPR) AI for 3D Generation Workshop},
url = {https://moverseai.github.io/single-shot},
month = {June},
year = {2024}
}
We consider the task of controllable and diverse motion synthesis from a single sequence as an alternative to the data-dependent text-to-motion methods, which pose ambiguities in data ownership and privacy. Recent works in hierarchical single-shot synthesis have paved the path for unconditional generation and editing tools, however the methods that focus on 3D animation have failed to control the diversity of the generated motions.
In this paper we propose the integration of the variational inference in single-shot GANs, aiming to encode and control the low-frequency generating factors of the single motion sample. Our experiment showcases the ability of our VAE-GAN model to control the diversity of its generations, while preserving their plausibility and quality.
[1] Hierarchical Patch VAE-GAN:: Generating Diverse Videos from a Single Sample.
[2] Skeleton-Aware Networks for Deep Motion Retargeting.
@inproceedings{tselepi2024singleshot,
author = {Tselepi, Eleni, and Thermos, Spyridon, and Albanis, Georgios and Chatzitofis, Anargyros},
title = {Controlling Diversity in Single-shot Motion Synthesis},
booktitle = {SIGGRAPH Asia (Posters)},
url = {https://moverseai.github.io/single-shot},
month = {December},
year = {2024}
}
Both projects have received financial support from the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No 101070533, EMIL-XR.