Researchers from the University of Edinburgh were present at last year’s edition of SIGGRAPH 2016, the Computer Graphics main international conference. Today I share one of the presentations from this conference, in the form of a Video and the corresponding paper. This paper is a must read for anyone interested in Deep Convolutional Neural Networks.
From the YouTube Video description we could read the Abstract of the paper:
We present a framework to synthesize character movements based on high level parameters, such that the produced movements respect the manifold of human motion, trained on a large motion capture dataset. The learned motion manifold, which is represented by the hidden units of a convolutional autoencoder, represents motion data in sparse components which can be combined to produce a wide range of complex movements. To map from high level parameters to the motion manifold, we stack a deep feedforward neural network on top of the trained autoencoder. This network is trained to produce realistic motion sequences from parameters such as a curve over the terrain that the character should follow, or a target location for punching and kicking. The feedforward control network and the motion manifold are trained independently, allowing the user to easily switch between feedforward networks according to the desired interface, without re-training the motion manifold. Once motion is generated it can be edited by performing optimization in the space of the motion manifold. This allows for imposing kinematic constraints, or transforming the style of the motion, while ensuring the edited motion remains natural. As a result, the system can produce smooth, high quality motion sequences without any manual pre-processing of the training data.
This line of research in deep convolutional neural networks joins other efforts in attempts to achieve full automation of the data-driven process, skewing the need for data preprocessing:
Most data-driven approaches currently available require a significant amount of manual data preprocessing, including motion segmentation, alignment, and labeling. A mistake at any stage can easily result in a failure of the final animation. Such preprocessing is therefore usually carefully performed through a significant amount of human intervention, making sure the output movements appear smooth and natural. This makes full automation difficult and so often these systems require dedicated technical developers to maintain.
The reliance on a new unsupervised non-linear manifold learning process appears to become a staple methodology in current deep learning frameworks. The results point in a direction where the full simulation of all of the details of human motion, for instance, might be the way forward for the quest of Artificial General Intelligence (AGI), at least from a human motion perspective:
This unsupervised non-linear manifold learning process does not require any motion segmentation or alignment which makes the process significantly easier than previous approaches. On top of this autoencoder we stack another feedforward neural network that maps high level parameters to low-level human motion, as represented by the hidden units of the autoencoder. With this, users can easily produce realistic human motion sequences from intuitive inputs such as a curve over some terrain that the character should follow, or the trajectory of the end effectors for punching and kicking. As the feedforward control network and the motion manifold are trained independently, users can easily swap and re-train the feedforward network according to the desired interface. Our approach is also inherently parallel, which makes it very fast to compute and a good fit for mainstream animation packages.
The capacity for smooth simulation of motion seems to rely on the black box nature of these convolutional autoencoders, where the hidden units are deemed to be responsible for the realistic simulation. But with the crucial step of editing and adjusting of data in a sparse and continuous fashion, while still allowing for complex movements of the body to be reproduced in the manifold space:
We also propose techniques to edit the motion data in the space of the motion manifold. The hidden units of the convolutional autoencoder represent the motion in a sparse and continuous fashion, such that adjusting the data in this space preserves the naturalness and smoothness of the motion, while still allowing complex movements of the body to be reproduced.
In summary, our contribution is the following:
• A fast, parallel deep learning framework for synthesizing character animation from high level parameters.
• A method of motion editing on the motion manifold for satisfying user constraints and transforming motion style.
From the same researchers I also found this more recent effort. This time it is about a character control mechanism by a novel neural network architecture called Phase-Functioned Neural Network. This new architecture promises to be a much more resource and energy-efficient approach compared with other common utilized methods in control mechanisms, such as Long Short-term Memory, and revealed to be excellent for simulation of virtual environments or computer games:
We present a real-time character control mechanism using a novel neural network architecture called a Phase-Functioned Neural Network. In this network structure, the weights are computed via a cyclic function which uses the phase as an input. Along with the phase, our system takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control. The entire network is trained in an end-to-end fashion on a large dataset composed of locomotion such as walking, running, jumping, and climbing movements fitted into virtual environments. Our system can therefore automatically produce motions where the character adapts to different geometric environments such as walking and running over rough terrain, climbing over large rocks, jumping over obstacles, and crouching under low ceilings. Our network architecture produces higher quality results than time-series autoregressive models such as LSTMs as it deals explicitly with the latent variable of motion relating to the phase. Once trained, our system is also extremely fast and compact, requiring only milliseconds of execution time and a few megabytes of memory, even when trained on gigabytes of motion data. Our work is most appropriate for controlling characters in interactive scenes such as computer games and virtual reality systems.
featured image: Phase-Functioned Neural Networks for Character Control – Fig. 1. A selection of results using our method of character control to traverse rough terrain: the character automatically produces appropriate and expressive locomotion according to the real-time user control and the geometry of the environment.