Digital Product Studio

MIMO from Alibaba Can Mimic Humans over Complex Motions and Interactive Scenes

Character animation has come a long way, but producing realistic videos has always been a challenge. While recent 2D models have excelled at image-guided synthesis, they lack control and struggle with complex 3D motions and interactions. Alibaba aims to change this with its new platform, MIMO.

What is MIMO?

Alibaba’s Institute for Intelligent Computing has recently introduced MIMO, a generalizable model that enables controllable character video synthesis with spatial decomposed modelling. MIMO stands for Mimicking Motion Object. This model allows users to simply provide inputs like a character image, motion sequence, and scene to synthesize realistic videos.

Demo Video of Alibaba MIMO

Key Features of Alibaba MIMO

Some key highlights of MIMO include:

1. Flexible User Control

MIMO allows users to control character, motion, and scene attributes by simply providing inputs like a character image, pose sequence, and scene video/image.

2. Scalability

The model can synthesize videos for arbitrary characters by just using a single reference image as input.

3. Motion Generality

It achieves high generality for novel 3D motions, including those extracted from in-the-wild videos.

4. Scene Applicability

MIMO is effective at producing animations within complex, real-world scenes featuring object interactions.

Example Videos Generated by Alibaba MIMO

Core Concept Behind MIMO

The core idea behind MIMO is spatial decomposed modelling. Unlike previous 2D techniques, it encodes video inputs in a 3D-aware manner by decomposing them into spatial components. Specifically, each frame is separated into three layers based on depth: the main human, the underlying scene, and floating occlusions.

The human layer is further disentangled into identity and motion codes using canonical appearance transfer and structured body codes, respectively. A shared VAE encoder embeds the scene and occlusion layers into a full scene code. These latent codes then control synthesis via a diffusion-based decoder. This 3D-aware approach enables flexible control and handling of challenging scenarios.

Alibaba MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling

How Alibaba MIMO Works

When a user provides inputs, MIMO embeds them into latent codes. It also spatially decodes driving videos into codes. These codes are inserted into a diffusion decoder for reconstruction. MIMO is jointly trained to minimize noise-prediction errors.

Alibaba MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling

Performance Evaluation of Alibaba MIMO

Alibaba researchers demonstrate MIMO’s abilities through various character video synthesis examples controlled by different attributes:

1. Arbitrary Character Control

The model animates diverse human and cartoon characters given just a single reference image.

2. Novel 3D Motion Control

The model faithfully mimics complex motions from large databases and in-the-wild videos.

3. Interactive Scene Control

It easily inserts characters into complicated real-world scenes with natural object interactions.

It also outperforms prior 2D and 3D methods on tasks like character replacement in videos. The results validate MIMO’s success through its unified framework. 

Potential Applications of Alibaba MIMO

MIMO opens doors for a wide range of applications in film, VR, content creation, e-commerce personalization, graphics and character animation. It could significantly lower video production costs and make animation accessible to all.

Future Work

Looking ahead, researchers plan to enhance MIMO’s realism and dynamism. Collecting larger, more diverse training datasets could help improve the model. Exploring additional controllable attributes like facial expressions is also interesting for future work. For more technical details, please visit the arXiV paper.

| Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.