DCAI
Loading Light/Dark Toggl

How to Speed Up WAN2.2 I2V in ComfyUI with SageAttention 2.2 and SpargeAttention

⏱️21min read
📅 Dec 03, 2025
How to Speed Up WAN2.2 I2V in ComfyUI with SageAttention 2.2 and SpargeAttention featured Image

In the previous article, I explained the basic usage of WAN2.2. In this guide, we’ll walk through how to speed up video generation for WAN2.2 on ComfyUI for Windows by installing “SageAttention 2.2” and “SpargeAttention”. SageAttention is a library that speeds up attention processing and reduces memory usage, helping both to optimize costs and improve performance for generative AI services. Installing SageAttention 2.2/SpargeAttention can feel a bit tricky for Windows users, but by using the repositories published by woct0rdho, you can set everything up with relatively simple steps. Follow this tutorial to efficiently accelerate generation with the Wan2.2-I2V-A14B model and Flux.1. (⚠️Because Wan2.2-I2V-5B produces noise, you cannot use it with SageAttention/SpargeAttention enabled.)

⚠️Important notes before installation

When adding SageAttention/SpargeAttention to ComfyUI, installing them directly into your current environment can change the Torch version and may break your existing custom nodes and other components. To avoid this, download a fresh portable build and install into that instead.

In this article, we’ll focus only on installing into the portable edition. Even if you’re using a Git clone or the desktop build, I highly recommend installing a separate portable ComfyUI that you dedicate to SageAttention/SpargeAttention.

This has not been fully tested yet, but in this guide we’ll use the latest portable build at the time of writing, v0.3.75. Since it ships with CUDA 13.0 and Torch 2.9.1, it also supports RTX 50XX series GPUs. Note that this tutorial does not cover installing “SageAttention3”.

Pros and cons of using SageAttention/SpargeAttention

In short, using SageAttention/SpargeAttention is a trade‑off between “faster generation speed” and “maximum stability”.

Advantages

The biggest benefit of SageAttention/SpargeAttention is a large speed‑up in generation. In addition to WAN video generation, Flux.1 and Flux.2 generations also become significantly faster.

Sample generation speeds

On my environment (RTX3090), using the WAN2.2 14B I2V workflow available on Patreon, I tested a 5‑second 1280×704 video generated in 4 steps with a Lightning 4Step LoRA. The generation times (without any upscaling or frame interpolation) were as follows:

  • Normal: about 8 minutes
  • SageAttention 2.2: about 5 minutes
  • SpargeAttention: about 4 minutes

You can see that the generation time is reduced to roughly half.

Disadvantages

On the downside, this is not an officially recommended configuration. Because the Torch version changes, ComfyUI itself and custom nodes can become unstable or throw unexpected errors.

Besides stability, updating ComfyUI also becomes more tedious. SageAttention/SpargeAttention and Triton are very sensitive to CUDA and Torch versions, so you must always check the CUDA and Torch versions before updating ComfyUI. If CUDA or Torch inside ComfyUI changes, you’ll need to reinstall the matching versions of SageAttention/SpargeAttention and Triton again.

Installing the ComfyUI portable build

Download the ComfyUI portable build

First, download the ComfyUI portable build. In this guide we’ll use the latest version at the time of writing, v0.3.75. ⚠️Other versions may fail to install correctly, so please be careful.

Download ComfyUI_windows_portable_nvidia.7z from the link below.

Location of ComfyUI_windows_portable_nvidia.7z 

After downloading, rename the folder to something like ComfyUI_windows_portable_SA and place it in the same directory as your existing ComfyUI installation. In this example it is placed at C:\Users\user-name\ComfyUI_windows_portable_SA.

Check the Python version and packages (optional)

Once the folder is in place, let’s check the Python version and installed packages.

Open File Explorer and navigate to C:\Users\user-name\ComfyUI_windows_portable_SA. Right‑click and choose “Open in Terminal” to open Windows PowerShell in the C:\Users\user-name\ComfyUI_windows_portable_SA directory. ✅All commands in this article assume you run them from this directory.

First, paste the command below to check the Python version.

python_embeded\python --version

The command will print the Python version. If you see Python 3.13.9, you’re good to go.

Python version 

Next, check the installed packages by running the following command:

python_embeded\python.exe -s -m pip list

Look at the versions of torch, torchaudio, and torchvision. If they match the values below, everything is set up correctly.

PackageVersion
torch2.9.1+cu130
torchaudio2.9.1+cu130
torchvision0.24.1+cu130
Python package list 

Installing Triton

Triton is a next‑generation development platform for optimizing GPU kernels to accelerate AI and machine learning workloads. SageAttention uses Triton‑optimized GPU kernels, so installing Triton is a prerequisite to get the best performance.

Install the Visual C++ Redistributable

First, install the Visual C++ Redistributable for Visual Studio 2015–2022. It is required to compile Triton.

Download and run the EXE file from the link below.

Triton-windows

On Windows, you need “Triton-windows” to use SageAttention 2.2/SpargeAttention. We’ll install the forked repository for Windows published by woct0rdho.

As described in the repository, the supported Triton-windows version depends on your Torch version. Use the table below and make sure you pick the correct combination.

TorchTriton-windows
2.73.3
2.83.4
2.93.5

In this ComfyUI setup, Torch is version 2.9, so we’ll use Triton-windows 3.5.

Clean up any existing Triton (optional)

If you downloaded a fresh portable build for this guide, you can skip this step. However, if a previous installation failed or you’re adding Triton to an existing ComfyUI, uninstall any old Triton packages first.

Uninstall Triton with the command below:

python_embeded\python.exe -s -m pip uninstall triton

If you also want to remove Triton-windows itself after a failed install, run the following command:

python_embeded\python.exe -s -m pip uninstall triton-windows

Install triton-windows 3.5.1

Install triton-windows 3.5.1 with this command:

python_embeded\python.exe -m pip install -U "triton-windows<3.6"
Installing triton-windows 3.5.1 

Install required Python files (portable build only)

Finally, copy the missing Python 3.13 library files that are not included in the portable build. Normally you would install Python locally once and copy the include and libs folders, but since woct0rdho, the author of triton-windows, has provided all necessary files in one archive, we’ll download that and place the include and libs folders into ComfyUI_windows_portable_SA/python_embeded. ⚠️These are separate from the existing Lib folder, so do not put libs inside Lib.

Use the direct link below to download “python_3.13.2_include_libs.zip”, then extract it and place the contents into C:\Users\user-name\ComfyUI_windows_portable_SA/python_embeded.

Where to place the folders
Where to place the folders

Test triton-windows (optional)

After installation, let’s run a quick test to make sure it works correctly.

Create a file named test_triton.py inside C:\Users\user-name\ComfyUI_windows_portable_SA/ComfyUI and paste in the following code:

import torch import triton import triton.language as tl
@triton.jit
def add_kernel(x_ptr, y_ptr, output_ptr, n_elements, BLOCK_SIZE: tl.constexpr):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
y = tl.load(y_ptr + offsets, mask=mask)
output = x + y
tl.store(output_ptr + offsets, output, mask=mask)

def add(x: torch.Tensor, y: torch.Tensor):
output = torch.empty_like(x)
n_elements = output.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta["BLOCK_SIZE"]),)
add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
return output

a = torch.rand(3, device="cuda")
b = a + a
b_compiled = add(a, a)
print(b_compiled - b)
print("If you see tensor([0., 0., 0.], device='cuda:0'), then it works")

Once the file is ready, run the command below:

python_embeded\python 'ComfyUI\test_triton.py'

If you see tensor([0., 0., 0.], device='cuda:0'), Triton is installed correctly. ⚠️You can safely ignore any “pynvml package” warnings.

Result of test_triton.py 

Installing SageAttention

Next, we’ll install “SageAttention 2.2” using the repository provided by woct0rdho.

Download SageAttention 2.2

In this example we’ll install “v2.2.0-windows.post3”. There are multiple wheel files, and you must choose the one that matches your current environment. Since we’re using CUDA13.0/Torch2.9.1/Python 3.13.9, download ComfyUI\sageattention-2.2.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl and place it inside C:\Users\user-name\ComfyUI_windows_portable_SA/ComfyUI. (Here, cp39-abi3 means “Python 3.9 or later in the 3.x series”.)

You can also download it with the command below:

wget https://github.com/woct0rdho/SageAttention/releases/download/v2.2.0-windows.post3/sageattention-2.2.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl -o ComfyUI\sageattention-2.2.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl

Install the SageAttention package

Once the wheel file is downloaded and in place, install the package with the command below:

python_embeded\python.exe -s -m pip install 'ComfyUI\sageattention-2.2.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl'
Installing v2.2.0-windows.post3 

Test SageAttention (optional)

After installation, let’s verify that SageAttention is working correctly.

Create a file named test_sageattn.py in C:\Users\user-name\ComfyUI_windows_portable_SA/ComfyUI and paste in the code below:

#!/usr/bin/env python3
import torch
import torch.nn.functional as F
from sageattention import sageattn
from torch.nn.attention import SDPBackend, sdpa_kernel

def get_rtol_atol(actual, expect):
actual = actual.float()
expect = expect.float()
diff = (actual – expect).abs()
eps = torch.tensor(
torch.finfo(actual.dtype).eps, device=actual.device, dtype=actual.dtype
)
rdiff = diff / torch.maximum(torch.maximum(actual.abs(), expect.abs()), eps)
return (
f”mean_rtol={rdiff.mean().item():.3g} ”
f”max_rtol={rdiff.max().item():.3g} ”
f”mean_atol={diff.max().item():.3g} ”
f”max_atol={diff.max().item():.3g}”
)

def main():
batch_size = 4
head_num = 32
seq_len = 64
head_dim = 128
dtype = torch.float16
q = torch.randn(batch_size, head_num, seq_len, head_dim, device=”cuda”, dtype=dtype)
k = torch.randn_like(q)
v = torch.randn_like(q)
print(“q”, tuple(q.shape), q.device, q.dtype)

# ‘Mathematically correct’ implementation
torch.backends.cuda.enable_math_sdp(True)
with sdpa_kernel(SDPBackend.MATH):
    out_math = F.scaled_dot_product_attention(q, k, v)

out_sage = sageattn(q, k, v)
print(“sage vs math:”, get_rtol_atol(out_sage, out_math))
print(“The above (except max_rtol) should be < 0.05 (on RTX 20xx/30xx) or < 0.1 (on RTX 40xx/50xx)”)
if name == “main”:
main()

Once the file is ready, run the command below:

python_embeded\python 'ComfyUI\test_sageattn.py'

After running, you should see a line like sage vs math: mean_rtol=0.0372 max_rtol=2 mean_atol=0.0264 max_atol=0.0264. Check all values except “max_rtol”.

If you have an RTX 20xx/30xx GPU, values below 0.05 are fine. For an RTX 40xx/50xx, values below 0.1 mean everything is working correctly.

Result of test_sageattn.py 

Installing SpargeAttention

If you want even more speed, install “SpargeAttention” as well. The installation steps are almost identical to “SageAttention”. ✅SpargeAttention speeds up video generation but does not accelerate still images from models like Flux.1 or Qwen-Image.

Download SpargeAttention

In this example we’ll install “v0.1.0-windows.post3”. As with SageAttention, several wheel files are provided, and you must pick the one that matches your environment. Since we’re using CUDA13.0/Torch2.9.1/Python 3.13.9, download ComfyUI\spas_sage_attn-0.1.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl and place it in C:\Users\user-name\ComfyUI_windows_portable_SA/ComfyUI.

You can also download it with this command:

wget https://github.com/woct0rdho/SpargeAttn/releases/download/v0.1.0-windows.post3/spas_sage_attn-0.1.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl -o ComfyUI\spas_sage_attn-0.1.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl

Install the SpargeAttention package

After downloading and placing the wheel file, install the package with the command below:

python_embeded\python.exe -s -m pip install 'ComfyUI\spas_sage_attn-0.1.0+cu130torch2.9.0.post3-cp39-abi3-win_amd64.whl'
Installing v0.1.0-windows.post3 

Test SpargeAttention (optional)

After installation, run a quick test to confirm that SpargeAttention is working correctly.

Create a file named test_spargeattn.py inside C:\Users\user-name\ComfyUI_windows_portable_SA/ComfyUI and paste in the following code:

#!/usr/bin/env python3
import torch
import torch.nn.functional as F
from spas_sage_attn import spas_sage2_attn_meansim_cuda
from torch.nn.attention import SDPBackend, sdpa_kernel

def get_rtol_atol(actual, expect):
actual = actual.float()
expect = expect.float()
diff = (actual – expect).abs()
eps = torch.tensor(
torch.finfo(actual.dtype).eps, device=actual.device, dtype=actual.dtype
)
rdiff = diff / torch.maximum(torch.maximum(actual.abs(), expect.abs()), eps)
return (
f”mean_rtol={rdiff.mean().item():.3g} ”
f”max_rtol={rdiff.max().item():.3g} ”
f”mean_atol={diff.max().item():.3g} ”
f”max_atol={diff.max().item():.3g}”
)

def main():
batch_size = 4
head_num = 32
seq_len = 128
head_dim = 128
dtype = torch.float16
q = torch.randn(batch_size, head_num, seq_len, head_dim, device=”cuda”, dtype=dtype)
k = torch.randn_like(q)
v = torch.randn_like(q)
print(“q”, tuple(q.shape), q.device, q.dtype)

# ‘Mathematically correct’ implementation
torch.backends.cuda.enable_math_sdp(True)
with sdpa_kernel(SDPBackend.MATH):
    out_math = F.scaled_dot_product_attention(q, k, v)

out_sparge = spas_sage2_attn_meansim_cuda(q, k, v)
print(“sparge vs math:”, get_rtol_atol(out_sparge, out_math))
print(“The above (except max_rtol) should be < 0.05 (on RTX 20xx/30xx) or < 0.1 (on RTX 40xx/50xx)”)
if name == “main”:
main()

When the file is ready, run this command:

python_embeded\python 'ComfyUI\test_spargeattn.py'

After running, you’ll see a line like sparge vs math: mean_rtol=0.0392 max_rtol=2 mean_atol=0.0249 max_atol=0.0249. Again, focus on the values other than “max_rtol”.

If you have an RTX 20xx/30xx GPU, values below 0.05 are fine. For an RTX 40xx/50xx, values below 0.1 indicate that everything is working correctly.

Result of test_spargeattn.py 

With this, all installations are complete. Next, we’ll share models and custom nodes from your previous ComfyUI environment so you can use the same workflows in the new portable build.

How to share models and custom nodes in ComfyUI

To avoid breaking your existing ComfyUI, we installed a new portable build in a different directory. As it is, this new ComfyUI has no models, so it can’t generate anything. To fix that, we’ll edit “extra_model_paths.yaml.example” and share models from your previous ComfyUI. ✅Because many custom nodes need to be updated from older versions, I recommend copying the custom_nodes folder from your old ComfyUI into the new ComfyUI’s custom_nodes and then updating them via ComfyUI-Manager, rather than sharing them directly.

How to edit extra_model_paths.yaml.example

In most cases, you just rename the “extra_model_paths.yaml.example” file in ComfyUI_windows_portable_SA/ComfyUI to extra_model_paths.yaml, then uncomment everything under #comfyui: (remove the # at the start of each line) and set base_path: to the path of your previous ComfyUI.

⚠️For some reason (possibly a bug or because the folder is in the same directory), full absolute paths did not work and files were not detected, so I used relative paths instead.

In this example, I already had a previous ComfyUI portable build in the same directory and only wanted to share models, so I edited the file like this:

comfyui:
    base_path: ../../ComfyUI_windows_portable/ComfyUI/
    # You can use is_default to mark that these folders should be listed first, and used as the default dirs for eg downloads
    #is_default: true
    checkpoints: models/checkpoints/
    text_encoders: |
        models/text_encoders/
        models/clip/  # legacy location still supported
    clip_vision: models/clip_vision/
    configs: models/configs/
    controlnet: models/controlnet/
    diffusion_models: |
        models/diffusion_models
        models/unet
    embeddings: models/embeddings/
    loras: models/loras/
    upscale_models: models/upscale_models/
    vae: models/vae/
    audio_encoders: models/audio_encoders/
    model_patches: models/model_patches/

If you also want to share custom nodes, add the following:

other_ui:
    base_path: ../../ComfyUI_windows_portable/ComfyUI
    checkpoints: models/checkpoints
    gligen: models/gligen
    custom_nodes: ../../ComfyUI_windows_portable/ComfyUI/custom_nodes

If you ever want to stop sharing, simply rename the file back to “extra_model_paths.yaml.example”.

How to use SageAttention 2.2 and SpargeAttention in ComfyUI

Now let’s actually enable SageAttention 2.2 and SpargeAttention in ComfyUI. First, we’ll configure the launch settings.

Normally, you start ComfyUI using run_nvidia_gpu.bat inside C:\Users\user-name\ComfyUI_windows_portable_SA, but SageAttention will not be enabled with the default command. You need to edit the ComfyUI launch command. Either edit run_nvidia_gpu.bat directly or copy it to a new file named run_nvidia_gpu_sa.bat and change its contents as follows:

.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention
echo If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest.
pause

Here we’ve added --use-sage-attention to the launch command.

Run the edited batch file to start ComfyUI.

If you see “Using sage attention” in the startup terminal log, SageAttention has been loaded correctly.✅If it displays “Using pytorch attention” here, SageAttention failed to load.

ComfyUI startup terminal log 

⚠️If you apply SageAttention/SpargeAttention to the Wan2.2-I2V-5B model, it produces noisy outputs. When generating with Wan2.2-I2V-5B, start ComfyUI without the --use-sage-attention option.

How to use SageAttention 2.2 with WAN2.2 A14B I2V

To add SageAttention 2.2 to your workflow, use the Patch Sage Attention KJ node included in the “ComfyUI-KJNodes” custom node pack. Place this node before both the High and Low branches of the “ModelSamplingSD3” node. Set the sage_attention parameter to auto.

With this setup, WAN2.2 A14B I2V inference will use “SageAttention”.

Where to connect Patch Sage Attention KJ
Where to connect Patch Sage Attention KJ

How to use SpargeAttention with WAN2.2 A14B I2V

To add SpargeAttention to your workflow, use the PatchRadialAttn node from the “ComfyUI-RadialAttn” custom node pack. Place this node before both the High and Low branches of the “ModelSamplingSD3” node. Change the default parameters so that last_dense_timestep on the High side is set to 0, and dense_timestep on the Low side is set to 0.

With this configuration, WAN2.2 A14B I2V inference will use “SpargeAttention”.

Where to connect PatchRadialAttn
Where to connect PatchRadialAttn

Generation with SpargeAttention offers not only faster performance but also improved image quality.

SageAttention2.2の生成結果

SpargeAttentionの生成結果

The workflow is basically the same as the one introduced in the previous article, but I’ve published a version that uses SageAttention 2.2 and SpargeAttention on Patreon. It is available only to paid supporters, but if you’re unsure about how to wire everything up, feel free to use it as a reference.

Conclusion

By combining SageAttention 2.2 and SpargeAttention, you can greatly speed up WAN2.2 video generation on ComfyUI for Windows. However, because this setup depends heavily on specific versions of Torch, CUDA, and triton-windows, it’s important to prepare a dedicated portable ComfyUI and carefully manage model sharing via extra_model_paths.yaml and custom node updates.

If you follow the steps in this article for environment setup and validation, you can build a fast yet stable video generation workflow centered around the WAN2.2 A14B I2V model. As a caution, Wan2.2-I2V-5B produces noise when used with SageAttention/SpargeAttention, so start ComfyUI without --use-sage-attention when using that model. Make good use of SageAttention 2.2 and SpargeAttention to find the best balance of speed and quality for your own environment.

Thank you for reading to the end.

If you found this even a little helpful, please support by giving it a “Like”!