Artificial intelligence is rapidly changing how creators approach video editing and visual effects. One exciting development is the ability to transform standard video footage into entirely new artistic styles. Imagine turning a simple recording into a vibrant anime scene, all while keeping the original motion and subject structure intact. This is now possible, largely for free, thanks to models like VACE WAN 2.1 used within the powerful ComfyUI platform.
This guide will walk you through the process of using the VACE WAN 2.1 workflow, inspired by the work of ComfyUI community legend Kijai, to achieve stunning video style transfers. We’ll cover setup, configuration, and tips for getting impressive, consistent results that surpass many previous methods.
Table of contents
- What is VACE WAN 2.1 and Why is it Exciting?
- Getting Started: Setting Up Your Environment
- Loading and Configuring the VACE Workflow in ComfyUI
- Crafting the Perfect Style: The Reference Image
- Fine-Tuning the Generation Settings
- Generating Your Stylized Video: The Moment of Truth
- Finding and Enhancing Your Output
- Conclusion: Unleash Your Creativity with VACE WAN 2.1
What is VACE WAN 2.1 and Why is it Exciting?
VACE WAN 2.1 is a free AI model designed specifically for video-to-video style transformation. Its key strength lies in its ability to apply a desired visual style (like anime, cartoonish, painterly, etc.) to an existing video while maintaining temporal consistency. This means the style stays coherent across frames, and the AI intelligently follows the structure and movement of the subjects in the original footage.
Used within ComfyUI, a node-based interface for Stable Diffusion models, VACE WAN 2.1 offers creators a powerful toolset without needing expensive software or deep technical expertise. The results can be remarkably fluid and visually appealing.
Getting Started: Setting Up Your Environment
Before diving into the creative process, some initial setup is required. This involves installing ComfyUI (if you haven’t already) and downloading the necessary components for the VACE WAN 2.1 workflow.
Installing ComfyUI
If you’re new to ComfyUI, you’ll need to install it on your computer first. It provides the framework where the VACE WAN workflow will run. Detailed installation guides are available online, often tailored to different operating systems.
Downloading Essential Models and Files
Several components are needed for this specific workflow:
- VACE One Model: This is the core AI model for the style transfer. It needs to be downloaded and saved in the correct ComfyUI models folder (typically ComfyUI/models/checkpoints/).
- VAE Model: A Variational Autoencoder (VAE) helps in encoding and decoding images during the generation process. Download the recommended VAE and place it in the ComfyUI/models/vae/ folder.
- Text Encoder: This component helps the model understand text prompts. Download the required text encoder file and save it to the ComfyUI/models/clip/ folder.
- The Workflow File: You’ll need the specific ComfyUI workflow file (.json) that orchestrates the VACE WAN 2.1 process. This guide uses a simplified version based on Kijai’s original work.
A special acknowledgment goes to Kijai, a prominent figure in the ComfyUI community known for creating sophisticated and effective workflows. You can often find his contributions on GitHub.
Loading and Configuring the VACE Workflow in ComfyUI
With all the necessary files downloaded and ComfyUI ready, it’s time to load and set up the workflow.
Initial Workflow Setup
First, launch ComfyUI. If you’ve used it before, it’s a good practice to check for updates via the ComfyUI Manager (Manager > Update All).
Next, simply drag the downloaded VACE WAN 2.1 workflow .json file onto the ComfyUI canvas. You might see a pop-up listing “Missing Nodes.” These are custom components the workflow requires. Use the ComfyUI Manager (Manager > Install Missing Custom Nodes) to find and install the latest versions of all listed nodes. After installation, restart ComfyUI completely.
The workflow might look complex at first glance with its interconnected nodes, but we’ll focus on the key settings.
Loading Your Source Video
Locate the “Load Video” node (or similar). Click the button to select the video file you want to transform. Key settings here include:
- Frame Load Cap: This determines how many frames of your video are processed. Setting it to 0 processes the entire video, but this can be very time-consuming and resource-intensive. The VACE WAN 2.1 model often performs optimally around 81 frames (roughly 3 seconds at standard frame rates). While longer sequences are possible (e.g., 300+ frames), quality might degrade, and style consistency can drift without advanced techniques. For this guide, sticking to around 81 frames is recommended for faster results.
- Skip Frames: Allows you to start processing from a specific point in your video.
- Format: Typically set to 1 or video.
Crafting the Perfect Style: The Reference Image
One of the most crucial elements for guiding the AI’s style is the reference image. This image tells the VACE WAN model what visual aesthetic you’re aiming for.
Why a Reference Image is Crucial
Instead of just using a text prompt like “anime style,” providing a visual example leads to much more specific and controlled results. The best practice is to take a single, clear frame from your source video and stylize that frame to use as your reference.
Method 1: Using ChatGPT for Quick Styling
You can easily stylize a frame using tools like ChatGPT (with image input capabilities):
- Export a representative frame from your video editing software (like Premiere Pro).
- Upload the frame to ChatGPT.
- Prompt it with something like: “Turn this image into anime style.”
- ChatGPT will generate a stylized version. It can also generate a text prompt describing the image, which can be useful later in the ComfyUI workflow.
Note: AI tools like ChatGPT might sometimes alter the aspect ratio. If this happens, you might need to use image editing software (like Photoshop with its Generative Fill feature) to correct the aspect ratio back to your original video’s dimensions (e.g., 16:9) and fill any empty space.
Method 2: Leveraging OpenArt.ai for Advanced Control
Websites like OpenArt.ai offer more specialized image generation features:
- Upload your exported frame to the Image-to-Image section.
- Choose an AI model (e.g., Flux DeV, Dream Shaper SDXL).
- Provide a text prompt (your own, or one generated by ChatGPT).
- Generate the image. OpenArt often produces results that strongly adhere to specific styles like anime.
- For maintaining structure, especially with SDXL models, you can use ControlNet features. Upload the original frame again and select a mode like “Scribble” to ensure the AI respects the subject’s pose and clothing details more closely, even while changing the style.
Uploading Your Reference to the Workflow
Once you have your stylized reference image, find the “Load Image Reference” node in the ComfyUI workflow and upload your created image there.
Fine-Tuning the Generation Settings
With the video and reference image loaded, you need to configure the core generation parameters within the workflow nodes.
Setting Resolution and Frame Rate
- Output Resolution: In nodes controlling size (often near the VAE Decode or Video Combine nodes), set the desired output resolution. Crucially, maintain the same aspect ratio as your original video. A common starting point that balances speed and quality is 1024×576 for a 16:9 video. Lower resolutions (e.g., 768×432) generate faster but require upscaling later. Higher resolutions demand significantly more GPU VRAM.
- Frame Rate: In the “Video Combine” node (or similar final output node), set the frame rate to match your original source video (e.g., 25 fps, 30 fps).
Writing Effective Prompts
Locate the text prompt input box (often connected to CLIP Text Encode nodes). You can enter a simple description like “anime style,” or paste and refine the more detailed prompt generated earlier by ChatGPT based on your reference image. Review the prompt carefully.
Understanding the One Video Sampler Settings
The “One Video Sampler” node (or similarly named core processing node) is where the main AI magic happens. Key settings include:
- Steps: Higher values generally produce more detail but increase processing time and VRAM usage. 20 steps is often a good starting point.
- CFG (Classifier Free Guidance): Controls how strictly the AI follows your text prompt. Values typically range from 2 to 8. Experiment to see what works best for your specific video and style.
- Seed Control: Set to randomize for unique results each time, or use a fixed seed if you want reproducible outputs.
- Scheduler/Sampler: Different schedulers (e.g., Euler, DPM++) can affect the final look. Euler is often a reliable choice.
Finalizing Output with Video Combine
In the final “Video Combine” node, double-check the frame rate matches your source. You can also set a default filename prefix (e.g., “V2V_Anime_Output”) for your generated videos.
Generating Your Stylized Video: The Moment of Truth
With all settings configured, it’s time to generate!
Click the “Queue Prompt” or “Run” button in ComfyUI. You’ll see the workflow execute node by node, indicated by green highlights. The One Video Sampler node will take the longest.
Keep an eye on your system’s resource usage. Generating an 81-frame sequence at 1024×576 with 20 steps on an RTX 4090 might use around 15GB of VRAM and take approximately 7 minutes. Lowering resolution or steps can significantly reduce VRAM usage and processing time, making it feasible on less powerful hardware.
Once complete, you can usually preview the generated video directly within the final “Video Combine” node in ComfyUI. The results can be striking, capturing nuanced movements like hair flowing or eyes blinking within the new style. The VACE WAN 2.1 model often handles even complex motion surprisingly well.
Finding and Enhancing Your Output
After generation, you need to locate the video file and potentially improve its quality.
Locating Your Generated Video File
Your stylized video will be saved in the ComfyUI/output/ folder unless specified otherwise in the workflow’s output nodes.
Upscaling for Higher Quality
If you generated at a lower resolution (like 1024×576 or 768×432) to save time or resources, you’ll likely want to upscale the video for better viewing quality. Dedicated AI video upscaling tools like Topaz Video AI are excellent for this.
Simply load your generated video into the upscaler, choose your target resolution (e.g., 4K), and select appropriate enhancement settings. With the right settings, AI upscalers can add incredible detail, transforming a lower-resolution generation into a crisp, high-definition result.
Conclusion: Unleash Your Creativity with VACE WAN 2.1
The VACE WAN 2.1 workflow in ComfyUI represents a significant leap forward in accessible AI video style transfer. It empowers creators to reimagine their footage in virtually any artistic style while preserving the essence of the original performance. By following the steps outlined above – careful setup, thoughtful reference image creation, and balanced setting adjustments – anyone can start producing unique and compelling stylized videos.
Experiment with different styles, test various settings, and see what incredible transformations you can achieve with this powerful AI tool. The world of AI video is evolving fast, and VACE WAN 2.1 puts cutting-edge capabilities directly into your hands.
| Latest From Us
- Forget Towers: Verizon and AST SpaceMobile Are Launching Cellular Service From Space

- This $1,600 Graphics Card Can Now Run $30,000 AI Models, Thanks to Huawei

- The Global AI Safety Train Leaves the Station: Is the U.S. Already Too Late?

- The AI Breakthrough That Solves Sparse Data: Meet the Interpolating Neural Network

- The AI Advantage: Why Defenders Must Adopt Claude to Secure Digital Infrastructure


