Despite the recent advances in the so-called "cold start" generation from text prompts, their needs in data and computing resources, as well as the ambiguities around intellectual property and privacy concerns pose certain counterarguments for their utility. An interesting and relatively unexplored alternative has been the introduction of unconditional synthesis from a single sample, which has led to interesting generative applications.
In this paper we focus on single-shot motion generation and more specifically on accelerating the training time of a Generative Adversarial Network (GAN). In particular, we tackle the challenge of GAN's equilibrium collapse when using mini-batch training by carefully annealing the weights of the loss functions that prevent mode collapse. Additionally, we perform statistical analysis in the generator and discriminator models to identify correlations between training stages and enable transfer learning. Our improved GAN achieves competitive quality and diversity on the Mixamo benchmark when compared to the original GAN architecture and a single-shot diffusion model, while being up to \(\times 6.8\) faster in training time from the former and \(\times 1.75\) from the latter.
Finally, we demonstrate the ability of our improved GAN to mix and compose motion with a single forward pass.
We provide some indicative applications of our single-shot GAN that do not need any re-training.
We use our single-shot GAN trained on the "breakdance freezes" sequence from the Mixamo dataset to generate variations of the "breakdance freezes" sample by sampling different codes from a Gaussian distribution. For visualization purposes we use the "Michelle" character provided by Mixamo.
We use the Mixamo motion "swing dancing" as an example input sequence, keeping the lower-body unaltered (fixed) and generating 7 alternative - but natural - versions of the upper-body. The displayed result is rendering using the "Jackie" character provided by Mixamo.
Changing to the "salsa dancing" input sequence, we now keep the upper-body fixed and sample partial codes for 7 alternative versions of the lower-body. The "Michelle" character is used for visualization.
@inproceedings{roditakis2024singleshot,
author = {Roditakis, Konstantinos, and Thermos, Spyridon, and Zioulis, Nikolaos},
title = {Towards Practical Single-Shot Motion Synthesis},
booktitle = {IEEE/CVF Computer Vision and Pattern Recognition (CVPR) AI for 3D Generation Workshop},
url = {https://moverseai.github.io/single-shot},
month = {June},
year = {2024}
}
This project has received funding from the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No 101070533, EMIL-XR.