Digital Product Studio

SDXL-Lightning by ByteDance: Blazingly Fast Text-to-Image Generation than SDXL Models

ByteDance, the makers of TikTok, have unveiled their latest text-to-image generative AI model called SDXL-Lightning. As the name suggests, this new model has incredibly fast and high-quality text-to-image generation capabilities using just 1-2 inference steps. This is a major breakthrough from the original SDXL models that required over 25 steps to reach comparable quality.

Introduction to SDXL-Lightning

While diffusion models have achieved outstanding results in generative tasks, their iterative sampling process is both slow and computationally expensive. For practical applications, reducing the required number of steps is crucial. Prior works attempted better ODE solvers, straightened flows, and model distillation, but quality remained subpar under eight steps.

SDXL introduced latent diffusion to text-to-image generation, supporting high-resolution 1024px outputs. However, its multi-step sampling took over 50 inferences. Clearly, faster generation was needed to unlock the full potential of diffusion models. This is where SDXL-Lightning comes in. SDXL-Lightning pushes the boundaries by enabling 1024px generation in a single step.

Example Generated Images by SDXL-Lightning

Image Generated by ByteDance new Text-to-Image Super fast model - SDXL-Lightning
Image Generated by ByteDance new Text-to-Image Super fast model - SDXL-Lightning
Image Generated by ByteDance new Text-to-Image Super fast model - SDXL-Lightning
Images Generated by ByteDance’s new Text-to-Image Super fast model – SDXL-Lightning

The Success Factor Behind SDXL-Lightning: Progressive Adversarial Distillation Method

SDXL-Lightning text-to-image model utilizes both progressive and adversarial distillation. Progressive distillation teaches the student network to predict farther ahead locations on the density flow, while adversarial loss ensures the student predictions match those of the teacher network.

Additionally, the distillation happens progressively from 128 steps to 32 steps, down to the final 1 step over multiple stages. After distilling mode coverage with adversarial loss, the requirements are relaxed to prioritize quality over coverage while retaining the overall flow.

This balanced approach is why SDXL-Lightning excellently bridges the quality-fidelity tradeoff that plagues other methods. 

Distilled from StabilityAI’s Stable Diffusion XL Base

The models used in SDXL-Lightning are distilled from StabilityAI’s stable diffusion XL base. This ensures that the generated images maintain a high level of stability and coherence. ByteDance has provided checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models, each offering its own unique generation quality.

Performance Evaluation

Comprehensive evaluations demonstrate that SDXL-Lightning sets a new state-of-the-art for few-step text-to-image generation. Both qualitative assessments and CLIP score metrics show SDXL-Lightning produces better quality images compared to LCM, SDXL-Turbo and the original SDXL model.

Quantitative Fréchet Inception Distance (FID) scores that measure quality and diversity are on par with other methods. However, FID computed on 299px patches – assessing high-resolution details – is substantially better, with over 2x lower scores compared to the next best model. This verifies that SDXL-Lightning generates far superior details in the 1024px images. 

SDXL-Lightning vs SDXL and SDXL Turbo
SDXL-Lightning vs SDXL and SDXL Turbo

Configuration Options ByteDance Provides with SDXL-Lightning 

Checkpoints are available for 1, 2, 4, and 8 inference steps, allowing users to balance speed vs quality as needed. The 1-step model generates images in a single pass, but quality can be inconsistent, so two steps or more are generally recommended. 

Two architectural options are supported – UNet and LoRA. 

1. UNet Checkpoints: 2-Step, 4-Step, 8-Step

The UNet models use a standard full neural network to condition the diffusion process. They provide the highest image quality generation but require more memory. The 2-Step, 4-Step, and 8-Step UNet SDXL-Lightning models by BteDance are as follows:

2. LoRA Checkpoints: 2-Step, 4-Step, 8-Step

The models have also shown reliable capability to handle different aspect ratios and demonstrated compatibility with existing LoRA modules for easy transferability between base models. The LoRA models adopt a lightweight regression approach. Image quality is slightly can be lower than UNet. 

But SDXL-Lightning has now updated the Loras to .safetensors files. These updated .safetensors files provide improved stability and coherence, resulting in even more realistic and visually appealing images. The Loras being updated to .safetensors is helpful as these compressed files save storage space.

SDXL-Lightning (LoRA)

ByteDance SDXL-Lightning With ComfyUI 

This model can also be integrated with ComfyUI for a more user-friendly experience. Whether you choose the 1-step, 2-step, 4-step, 8-step UNet or 2-step, 4-step, 8-step UNet loras, ComfyUI provides a streamlined workflow for generating images from text. Below are the download links to the respective ComfyUi workflows:

ByteDance SDXL Lightning Demo on HuggingFace

SDXL Lightning demo on HuggingFace provides a direct and seamless experience for generating images with ease. Below are the steps to get started!

Step 1: Visit the Demo Page

Access the model page on HuggingFace. 

Demo Link: https://huggingface.co/spaces/AP123/SDXL-Lightning 

ByteDance SDXL Lightning Demo on HuggingFace

Step 2: Write a prompt

In the provided textbox, type in a prompt that describes the image you want to generate..

Step 3: Hit Submit

Once you’ve written your prompt, click on the Submit button. This will initiate the image processing. After acquiring GPU, SDXL Lightning will work its magic and start generating images based on your prompt. 

Step 4: Download and Explore

Once the image is generated, you will have the option to download it. Simply click on the download button and save the image to your device. Feel free to experiment with more prompts.

Conclusion

With SDXL-Lightning, ByteDance has achieved a momentous advancement in text-to-image synthesis.  The LoRA-trained models further expand usability as plug-and-play modules. However, like other generative models, there are risks of misuse in spreading misinformation or inappropriate content. Responsible and ethical development practices are necessary to mitigate these concerns. Overall, though, models like SDXL-Lightning exemplify AI’s incredible potential for computational creativity. Its methodology also inspires new directions for diffusion model distillation research. For more technical details, please visit project arXiV paper.

| Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.