Digital Product Studio

LayerDiffusion Lets You Create Transparent Images Layer-By-Layer With AI Models Like Stable Diffusion

When it comes to generating images, large-scale AI models have become essential in the fields of computer vision and graphics. However, there has been limited research focused on layered content generation or transparent image generation. Surprisingly, considering the high demand for such capabilities in the market, Most visual content editing software relies heavily on transparent or layered elements to compose and create content. The lack of training data and the complexity of manipulating existing large-scale image generators have contributed to this research gap. A new technique called LayerDiffusion can help address this challenge by using the latent transparency approach. This could be a game changer for creative workflows.

Introducing LayerDiffusion and Latent Transparency

Researchers at Stanford University have introduced a novel approach called LayerDiffusion. This approach allows large-scale pretrained latent diffusion models to generate transparent images and multiple transparent layers. 

LayerDiffusion introduces “latent transparency”, which encodes image transparency into latent offsets while preserving the original latent distribution. It trains an encoder to convert pixel-level RGBA channels into a latent offset. Another decoder reconstructs the transparent image from the adjusted latent.

Attention sharing and Layer Rigidity Adaptors allow the generation of multiple transparent layers jointly with harmonic compositions.Thus, The pre-trained diffusion model, like Stable Diffusion, gets fine-tuned with transparency-enabled patients, teaching it to render transparent images.

Performance and Quality of LayerDiffusion

In user studies conducted, it was found that in the majority of cases (97%), users preferred the transparent content generated natively by the LayerDiffusion method compared to previous ad-hoc solutions like generating-then-matting. The quality of the generated transparent images was comparable to real commercial transparent assets, such as those found on Adobe Stock.

Transparent Image Generation Capabilities with LayerDiffusion

The new LayerDiffusion technique brings powerful transparent image generation abilities to AI systems like Stable Diffusion. There are as follows:

1. Single Image Generation

For single-image generation, you just need to provide a text prompt like “man” or “animal.” Then, LayerDiffusion can generate a transparent PNG with the hair, face, and background separated into RGBA channels.

LayerDiffusion Creating Transparent Images Layer-By-Layer With AI Models Like Stable Diffusion

2. Multi-Layer Generation

For this, you need to provide a foreground, background, and complete scene. LayerDiffusion will output separate transparent layers that are composited seamlessly, keeping lighting and geometry coherent.

LayerDiffusion Creating Transparent Images Layer-By-Layer With AI Models Like Stable Diffusion
Prompts: “plant on the table”, “woman in the room”, “dog on the floor”, and “man walking on the street”.

3. Conditional Layering

Foreground-Conditioned Background

LayerDiffusion can fix the foreground transparent image, generate a background image matched to it, and adapt lighting colour and geometry as needed.

Prompts: “Man sitting on chair”, “man sitting in forest”, “pots on wood table”, “parrot in room”, “parrot in forest”. 

Background-Conditioned Foreground

Plus, it can fix the background and generate a matching foreground.

LayerDiffusion Creating Transparent Images Layer-By-Layer With AI Models Like Stable Diffusion
Prompts: “robot in sofa waving hand”, “man in sofa” and “bird on hand”, “apple on hand”.

4. Iterative Layering

The model can compose multiple layers iteratively by repeating the background-conditioned foreground model to incrementally build up compositions with any number of transparent layers, which is useful for iterative conception.

5. Compatibility with Control Methods

Users can combine LayerDiffusion with existing control frameworks like ControlNet to guide layer generation, indicating desired layouts, object shapes, etc.

Prompts: “human in street”, “human in forest”, “big reflective ball in street”, and “big reflective ball in forest”

Training Details

To train the LayerDiffusion framework, a human-in-the-loop scheme is employed, collecting data simultaneously. The team collected 1 million layered transparent images with human assistance and GPT-powered prompting. The dataset used for training consists of 1 million transparent images and covers a diverse range of content topics and styles. This dataset enables the training of transparent image generators. 

Conclusion

The layerDiffusion approach’s latent transparency brings AI image synthesis to the realm of layered image construction and compositing for the first time. And, it does so while retaining the impressive artistic capabilities models like Stable Diffusion possess. This combination of power and precision promises to significantly enhance creative workflows spanning digital art, VFX, graphic design, and more as capabilities continue advancing rapidly. Moreover, this approach paves the way for advanced visual content creation with transparency effects.

| Also Read

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.