Towards Practical Single-shot Motion Synthesis

Moverse

CVPR 2024, AI for 3D Generation Workshop

Abstract

Despite the recent advances in the so-called "cold start" generation from text prompts, their needs in data and computing resources, as well as the ambiguities around intellectual property and privacy concerns pose certain counterarguments for their utility. An interesting and relatively unexplored alternative has been the introduction of unconditional synthesis from a single sample, which has led to interesting generative applications.

In this paper we focus on single-shot motion generation and more specifically on accelerating the training time of a Generative Adversarial Network (GAN). In particular, we tackle the challenge of GAN's equilibrium collapse when using mini-batch training by carefully annealing the weights of the loss functions that prevent mode collapse. Additionally, we perform statistical analysis in the generator and discriminator models to identify correlations between training stages and enable transfer learning. Our improved GAN achieves competitive quality and diversity on the Mixamo benchmark when compared to the original GAN architecture and a single-shot diffusion model, while being up to \(\times 6.8\) faster in training time from the former and \(\times 1.75\) from the latter.

Finally, we demonstrate the ability of our improved GAN to mix and compose motion with a single forward pass.

Mixamo-based Generation

We provide some indicative applications of our single-shot GAN that do not need any re-training.



Single-Motion Variations

We use our single-shot GAN trained on the "breakdance freezes" sequence from the Mixamo dataset to generate variations of the "breakdance freezes" sample by sampling different codes from a Gaussian distribution. For visualization purposes we use the "Michelle" character provided by Mixamo.

Upper-body Motion Composition

We use the Mixamo motion "swing dancing" as an example input sequence, keeping the lower-body unaltered (fixed) and generating 7 alternative - but natural - versions of the upper-body. The displayed result is rendering using the "Jackie" character provided by Mixamo.

Lower-body Motion Composition

Changing to the "salsa dancing" input sequence, we now keep the upper-body fixed and sample partial codes for 7 alternative versions of the lower-body. The "Michelle" character is used for visualization.

Crowd Animation

Following [1] and [2] we provide an example of a "crowd" dancing using generated variations of the "dancing" sequence of the Mixamo dataset. Again, for rendered result the "Jackie" character is used.

References

[1] GANimator: Neural motion synthesis from a single sequence.

[2] SinMDM: Single motion diffusion.

BibTeX

@inproceedings{roditakis2024singleshot,
  author    = {Roditakis, Konstantinos, and Thermos, Spyridon, and Zioulis, Nikolaos},
  title     = {Towards Practical Single-Shot Motion Synthesis},
  booktitle = {IEEE/CVF Computer Vision and Pattern Recognition (CVPR) AI for 3D Generation Workshop},
  url       = {https://moverseai.github.io/single-shot},
  month     = {June},
  year      = {2024}  
}

Acknowledgement

This project has received funding from the European Union’s Horizon Europe Research and Innovation Programme under Grant Agreement No 101070533, EMIL-XR.