Wan 2.2 Basic Guide: Getting Started with High-Quality AI Video Generation in ComfyUI

⏱️33min read

📅 Oct 23, 2025

Wan 2.2 Basic Guide: Getting Started with High-Quality AI Video Generation in ComfyUI featured Image

📄Table of contents

What Is the WAN 2.2 Video Generator?
1. Wan 2.1 vs. Wan 2.2 Key Differences
How to Set Up Wan 2.2 in ComfyUI
1. Download Wan 2.2 Model Files
Walkthrough: ComfyUI_examples 5B TI2V Workflow
Walkthrough: ComfyUI_examples A14B I2V Workflow
1. Sampling
Wan 2.2 Render Times Explained
Wan 2.2 Prompt Design Best Practices
Customize the Official Wan 2.2 Workflows
Pro Workflow Tips
Push Speed Further
Conclusion

This guide organizes how to work with Wan 2.2 for AI video production. At the time of writing, video generation dominates the generative AI scene, and DCAI previously highlighted “ComfyUI-AnimateDiff-Evolved” as our recommended custom node. Back then only a handful of video models could run locally, so cloud-first services such as Sora, Runway, and Luma AI led the pack, but now excellent locally runnable models like Tencent’s Hunyuan keep arriving. Here we focus on Alibaba Cloud’s open-source video creation suite “🔗Wan 2.2”. Among open-source video generators you can operate yourself, it currently ranks among the best. Wan is an AI video editing suite that offers paid on-demand generation and API-based cloud access, plus a free open-source runtime you can host locally. ComfyUI and SwarmUI already support local execution, and this article walks through the ComfyUI workflow. We build on the official guide and dig into techniques that harden stability. Let’s cover Wan fundamentals first, then expand the workflow to aim for “high-quality videos”.

What Is the WAN 2.2 Video Generator?

“Wan 2.2” is a large-scale diffusion Transformer video model that employs a two-stage Mixture-of-Experts (MoE) design with a high-noise (initial) phase and a low-noise (final) phase (A14B model only). Feed it text or reference images to render cinematic, high-quality footage. The paper “🔗Wan: Open and Advanced Large-Scale Video Generative Models” introduces a new VAE structure and scaling strategy that you can use inside ComfyUI as-is.

Supported tasks: Text-to-video (T2V), image-to-video (I2V), and speech-to-video (S2V) are available
Default resolution: T2V/I2V deliver 480p–720p; TI2V-5B is tuned for 720p@24fps
Model lineup: MoE A14B models for T2V/I2V, a hybrid 5B TI2V model, plus a dedicated Wan 2.2 (TI2V-5B) VAE
GPU memory requirements: T2V/I2V/S2V-A14B models target 80 GB-class GPUs, but ComfyOrg’s FP8 repack lets you offload to an RTX 4090 (24 GB). TI2V-5B needs roughly 24 GB

Wan 2.1 vs. Wan 2.2 Key Differences

Alongside the MoE architecture, Wan 2.2 trains on a dataset that increases image coverage by 65.6% and video coverage by 83.2% compared with Wan 2.1. The 5B model introduces a new 16×16×4 compression VAE, allowing 720p@24fps output. (From the 🔗Wan2.2 README)

How to Set Up Wan 2.2 in ComfyUI

To run the WAN Video Generation workflow, place the model files in the correct folders. Update ComfyUI to the latest build first, then follow the sequence below.

ComfyUI has been unstable lately. The frontend and core still feel out of sync, and I keep running into bugs. Installing the Wan components alone corrupted my environment and forced a clean install. If you want to preserve your current ComfyUI setup, back it up before installing or spin up a fresh ComfyUI Portal so you can build Wan in a separate environment.

Download Wan 2.2 Model Files

Grab the files from the Hugging Face repository “Comfy-Org/Wan_2.2_ComfyUI_Repackaged” and place them under ComfyUI/models.

⚠️The T2V-A14B/I2V-A14B models ship in High Noise and Low Noise pairs. They behave like SDXL’s base and refiner: start inference with High Noise, then hand over to Low Noise mid-run to polish the finish.

Comfy-Org/Wan_2.2_ComfyUI_Repackaged · Hugging Face

https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged/tree/main/split_files

🔗External Link

T2V A14B: wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors + wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors (an FP16 build is bundled, so switch if you have VRAM headroom)
I2V A14B: wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors + wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors. For maximum fidelity, use the matching FP16 versions (..._fp16.safetensors).
TI2V 5B / S2V 14B: Task-specific builds such as wan2.2_ti2v_5B_fp16.safetensors and wan2.2_s2v_14B_fp8_scaled.safetensors live in the same repository.
LoRA: Pair wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors with wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors, and use the I2V set wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors + wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors.
Text Encoder: umt5_xxl_fp16.safetensors or the lighter umt5_xxl_fp8_e4m3fn_scaled.safetensors.
Audio Encoder: Use wav2vec2_large_english_fp16.safetensors when you need audio-driven generation.
VAE: Load wan_2.1_vae.safetensors for the A14B models and wan2.2_vae.safetensors for the 5B model.

Example placement:

ComfyUI/
├── 📁 models/
│   ├── 📁 audio_encoders/
│   │   └── wav2vec2_large_english_fp16.safetensors
│   ├── 📁 diffusion_models/
│   │   ├── wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors
│   │   ├── wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors
│   │   ├── wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors
│   │   ├── wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors
│   │   ├── wan2.2_ti2v_5B_fp16.safetensors
│   │   └── wan2.2_s2v_14B_fp8_scaled.safetensors
│   ├── 📁 loras/
│   │   ├── wan2.2_t2v_lightx2v_4steps_lora_v1.1_high_noise.safetensors
│   │   ├── wan2.2_t2v_lightx2v_4steps_lora_v1.1_low_noise.safetensors
│   │   ├── wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors
│   │   └── wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
│   ├── 📁 text_encoders/
│   │   ├── umt5_xxl_fp8_e4m3fn_scaled.safetensors
│   │   └── umt5_xxl_fp16.safetensors
│   └── 📁 vae/
│       ├── wan_2.1_vae.safetensors
│       └── wan2.2_vae.safetensors

Walkthrough: ComfyUI_examples 5B TI2V Workflow

ComfyUI_examples / Wan 2.2 Models · GitHub

https://comfyanonymous.github.io/ComfyUI_examples/wan22/

🔗External Link

The “Wan 2.2 Models” page on ComfyUI_examples ships with a baseline Wan 2.2 video workflow. This section builds on it to show how to render 720p clips with the text+image-to-video (TI2V) 5B model. The layout is simple and works for both text-only runs and I2V jobs that start from a still image.

Load the Image to Video sample to understand the Text to Video path as well. Download the sample image and drag it into ComfyUI, or right-click “Workflow in Json format” under the image, save it, and import the JSON file.

You can grab the input image from 🔗here.

Load the Models

UNETLoader: Load wan2.2_ti2v_5B_fp16.safetensors as the diffusion backbone. The TI2V model conditions on both text and images and is tuned for 720p@24fps (480p is not supported).
ModelSamplingSD3: This Stable Diffusion 3 node is repurposed here to rebuild the sampling schedule, letting you adjust noise levels.
CLIPLoader: Use umt5_xxl_fp8_e4m3fn_scaled.safetensors (type wan) for text conditioning. Wan 2.2 relies on a T5/CLIP hybrid, so this compatibility matters.
VAELoader: Decode with the latest wan2.2_vae.safetensors to improve color fidelity and detail.

The ModelSamplingSD3 node (a CONST head plus ModelSamplingDiscreteFlow) remaps the denoising curve via its internal time_snr_shift. You can choose samplers such as euler, heun, dpmpp_2m, or uni_pc, but the best option depends on your model, resolution, and step count.

✅Lowering shift increases the influence of the early noise, making changes more dramatic, while raising it calms the tail end and leans toward static, detailed frames (results vary by environment).

When you generate with minimal steps—for example, using Lightning LoRA—the effect is weaker. Nudge the value toward 9 if you want more detail, or around 6 if you want stronger motion.

Prompt Setup for Wan 2.2

Wan uses the bilingual UMT5-XXL text encoder, so enter prompts in English or Simplified Chinese.

Positive Prompt: Describe the scene and action clearly, e.g., a cute anime girl with fennec ears and a fluffy tail walking in a beautiful field.
Negative Prompt: Numerous Chinese phrases are preloaded to suppress artifacts, helping you avoid color clipping, distortion, and missing limbs.

Configure Initial Latents and Duration

Use the Wan22ImageToVideoLatent node to set resolution and frame count. The template defaults to 1280×704, length=41, and batch_size=1. A Note node in the corner notes that 121 frames are recommended, but the template keeps it shorter for quicker previews. Connect a still image to start_image for I2V, or leave it empty for text-only runs.

Sampling Settings

The KSampler node drives the generation.

Steps: 30
CFG: 5 (Raising it too high introduces flicker)
Sampler: simple (the lightweight scheduler from comfy/samplers.py, which samples evenly from the sigmas array inside ModelSamplingSD3 so you preserve Wan 2.2’s intended noise curve)
Scheduler: uni_pc
Seed: With randomize enabled you get a new clip each time. Switch “control after generate” to fixed to lock it.
Denoise: 1.0 ⚠️Lowering this in I2V does not keep the source image, so leave it at 1.0.

Decode and Export

VAEDecode: Convert latents into frames with the Wan 2.2 VAE.
SaveAnimatedWEBP: Export the 24fps sequence as an animated WebP at quality 80 for lightweight previews.
SaveWEBM: Output a 24fps WebM via the vp9 codec. The crf (bitrate) is set around 16 Mbps, making it suitable as a high-quality master.

The Wan 2.2 5B model alone struggles to deliver consistently high-end footage, but it is lightweight and approachable—even lower-spec PCs can run it—so it works well as an entry point when you test Wan.

Walkthrough: ComfyUI_examples A14B I2V Workflow

ComfyUI_examples Sample A14B Model (I2V)

The flow is almost identical to the 5B model. The key difference is that the A14B build ships with High Noise and Low Noise checkpoints, so you need to switch samplers mid-run, just like SDXL with a refiner.

Sampling

Use the KSampler (Advanced) node twice. The official recommendation switches to the Low Noise model at 50% completion.

Pass 1

Add Noise: enable
Seed: randomize
Steps: 20
CFG: 3.5
Sampler: euler
Scheduler: simple
Start at step: 0
End at step: 10 (stop at the specified step)
Return with leftover noise: enable (keeps the latent with residual noise when you stop mid-process)

Pass 2

Add Noise: disable (you reuse the noise from pass 1)
Seed: fixed
Steps: 20
CFG: 3.5
Sampler: euler
Scheduler: simple
Start at step: 10 (resume from the chosen step)
End at step: 10000
Return with leftover noise: disable

This A14B example combines the euler sampler with the simple scheduler. The simple mode samples evenly from the sigmas array inside ModelSamplingSD3, matching the training schedule released by the Wan team. You can pick other schedulers (normal, karras, exponential, sgm_uniform, beta, linear_quadratic, kl_optimal, and so on), but they alter the time-SNR curve and often break motion, so avoid them outside of tests. Switching the sampler to dpmpp_2m or dpmpp_2m_sde changes smoothness slightly, yet you should still pair them with the simple scheduler for safety.

Wan 2.2 Render Times Explained

Even in FP8, the Wan 2.2 A14B model takes longer than the 5B build. The sample Wan 2.2 A14B I2V setup ran for about 45 minutes on an RTX 3090, yet the MoE architecture keeps quality on top. Switching to the Wan 2.2 5B model with the same settings cut the render to roughly three minutes, but the results were unusable—a bouquet morphed into a gun mid-scene.

For local A14B use, rely on GGUF variants and Lightning 4-step LoRA. Windows users face more steps than Linux, but if your GPU supports it, installing Sage Attention can shave significant time off inference.

If you need smoother A14B production, consider cloud GPU services such as RunPod.

Wan 2.2 Prompt Design Best Practices

Consistent frames are critical in video generation, so separate the “scene skeleton” from “cinematic keywords” in your prompts. Here’s a recommended T2V template:

{main_subject}, {outfit_detail}, shot on anamorphic lens, cinematic lighting, soft rim light, depth of field, trending on artstation
Negative prompt: motion blur, duplicated limbs, distorted face, overexposed, low detail background

Place the key elements (subject, outfit, etc.) at the top of the prompt and cluster cinematic keywords at the end to reduce drift. Prioritize riskier terms such as “motion blur” or “duplicated limbs” in the negative prompt.

For I2V, your reference image already defines subject, scene, and style, so focus the prompt on motion and camera direction.

The official guide “🔗Easy Creation with One Click – AI Videos” is also worth reading.

Customize the Official Wan 2.2 Workflows

Next we extend the official ComfyUI Wan 2.2 workflows to push quality higher. We provide separate customizations for the 5B and A14B models. The enhancements cover:

Lightweight inference: Avoid situations where FP16 or FP8 is too heavy to run or takes excessive time.
Loop-ready videos: Build seamless infinite loops.
AI-driven upscaling: Wan 2.2 cannot run SDXL or Flux.1 as a second pass, so we upscale with dedicated models.
Frame interpolation: Smooth out 16fps (A14B) and 24fps (5B) output with interpolation.
Orientation toggle: Switch between portrait and landscape presets in one click.

The workflow is hosted on Patreon for paid supporters.

WAN2.2 Basic Custom Workflow | Digital Creative AI | PATREON

https://www.patreon.com/posts/141856106

🔗External Link

Here are sample clips generated with the custom workflow.

One clip uses the Wan2.2-I2V-5B model and the other the Wan2.2-I2V-A14B model. Both rely on ComfyUI defaults (the optional --use-sage-attention and --fast flags are disabled) and finish in about ten minutes, not counting upscaling.

Wan2.2-I2V-5B Model Sample

Wan2.2-I2V-A14B Model Sample

🔒This content is limited to paid supporters. Paid supporters can view it after logging in.

Pro Workflow Tips

Below are two pro techniques you can use during AI video production.

Export frames as still images
Generate videos like a storyboard

Use these tips to polish your footage into high-quality results like the example below.

Color grading and logo placement were added in post-processing.

🔒This content is limited to paid supporters. Paid supporters can view it after logging in.

Push Speed Further

Beyond Sage Attention, which we referenced earlier, kijai’s “ComfyUI-WanVideoWrapper” adds finer control over Wan 2.2. This guide only introduces it briefly—we’ll cover the full setup in a future article.

kijai/ComfyUI-WanVideoWrapper · GitHub

https://github.com/kijai/ComfyUI-WanVideoWrapper

🔗External Link

Conclusion

Wan 2.2 centers on the A14B model and enables high-quality video production locally. In ComfyUI, proper model placement, the required custom nodes, High/Low switching, and GGUF/Lightning LoRA support let you balance stability and speed. Combine thoughtful prompt design, loop workflows, upscaling, interpolation, and external tooling to build a production-ready pipeline.

Category:📂 Intermediate

Tags:🏷️ ComfyUI 🏷️ Install

Thank you for reading to the end.

If you found this even a little helpful, please support by giving it a “Like”!

What Is the WAN 2.2 Video Generator?🔗

Wan 2.1 vs. Wan 2.2 Key Differences🔗

How to Set Up Wan 2.2 in ComfyUI🔗

Download Wan 2.2 Model Files🔗

Walkthrough: ComfyUI_examples 5B TI2V Workflow🔗

Load the Models🔗

Prompt Setup for Wan 2.2🔗

Configure Initial Latents and Duration🔗

Sampling Settings🔗

Decode and Export🔗

Walkthrough: ComfyUI_examples A14B I2V Workflow🔗

Sampling🔗

Pass 1

Pass 2

Wan 2.2 Render Times Explained🔗

Wan 2.2 Prompt Design Best Practices🔗

Customize the Official Wan 2.2 Workflows🔗

Pro Workflow Tips🔗

Push Speed Further🔗

Conclusion🔗

What Is the WAN 2.2 Video Generator?

Wan 2.1 vs. Wan 2.2 Key Differences

How to Set Up Wan 2.2 in ComfyUI

Download Wan 2.2 Model Files

Walkthrough: ComfyUI_examples 5B TI2V Workflow

Load the Models

Prompt Setup for Wan 2.2

Configure Initial Latents and Duration

Sampling Settings

Decode and Export

Walkthrough: ComfyUI_examples A14B I2V Workflow

Sampling

Wan 2.2 Render Times Explained

Wan 2.2 Prompt Design Best Practices

Customize the Official Wan 2.2 Workflows

Pro Workflow Tips

Push Speed Further

Conclusion