DCAI
Loading Light/Dark Toggl

How to Speed Up Qwen-Image-Edit 2511 Using Nunchaku in Forge-Neo

⏱️16min read
📅 May 30, 2026
How to Speed Up Qwen-Image-Edit 2511 Using Nunchaku in Forge-Neo featured Image

What You’ll Learn in This Article

  • How Nunchaku and SVDQuant work, and how they differ from GGUF.
  • The basic steps for using Qwen-Image-Edit-2511 in Forge-Neo.
  • How to install Nunchaku models in Forge-Neo.
  • Generation speed and quality comparison results between Forge-Neo and ComfyUI.
  • Quality improvement techniques and how to specify images in prompts. (💎 Members only)

In this article, I’ll explain how to use Qwen-Image-Edit in WebUI Forge-Neo. I’ll also show you how to generate images faster using the new quantization method Nunchaku (SVDQuant). For those who are comfortable with WebUI but find ComfyUI too complex to use, I recommend Qwen-Image-Edit in Forge-Neo. The generation results aren’t as high quality as ComfyUI, but the quality is more than usable, so why not give it a try? I’ve written previous articles on how to use Forge-Neo and Qwen-Image-Edit, so please refer to those.

What Is Nunchaku (SVDQuant)?

Nunchaku is a high-performance inference engine for 4-bit quantized neural networks that can run large-scale generative AI models at high speed and with low memory usage even on consumer GPUs such as NVIDIA RTX series. The core technology is a quantization method called SVDQuant, which compresses both weights and activations to 4 bits (W4A4) while retaining only the components that are difficult to quantize as 16-bit low-rank matrices via SVD (singular value decomposition), significantly reducing model size while maintaining image quality.

Supported models include FLUX.1 series, Qwen-Image/Qwen-Image-Edit, SANA, and many others. In terms of performance, for FLUX.1-dev (12B parameters), it achieves approximately 3.6x memory reduction compared to standard BF16 and approximately 3x speedup compared to NF4 (W4A16). On an RTX 4090 laptop, a 10x speedup over 16-bit has been reported by eliminating CPU offload, and with CPU offload enabled, it can run with as little as 4GB VRAM. It also supports LoRA, ControlNet, and image editing (such as Qwen-Image-Edit). LoRA can use existing standard models (BF16/FP16 format) as-is without re-quantization. ControlNet can also use standard models as-is. Note that base models (such as FLUX.1-dev) require Nunchaku-specific quantized models. ComfyUI integration is also available.

Differences from GGUF

GGUF is another 4-bit quantization format, but there are significant differences in how it works and its speed. GGUF originates from llama.cpp and compresses only the weights to 4–8 bits, restoring them to 16 bits (dequantizing) before computation (W4A16). While memory usage is reduced, inference speed improvement is limited because the restoration process becomes a bottleneck.

Nunchaku’s SVDQuant, on the other hand, performs computations with both weights and activations kept at 4 bits (W4A4), reducing the actual matrix computation load and significantly improving inference speed. However, activation quantization is susceptible to outliers, which can lead to image quality degradation. SVD solves this problem by isolating only the components containing outliers via SVD and processing them as 16-bit low-rank matrices, preserving image quality. As a result, while GGUF achieves “speed improvement through reduced memory and memory bandwidth,” Nunchaku’s SVDQuant delivers “reduced memory and dramatic speedup at the computation level.”

Supported GPUs

Nunchaku supports NVIDIA GPUs with compute capability in the range of sm_75 (Turing) to sm_120 (Blackwell).

ArchitectureGenerationCompute CapabilityPrecision Mode
TuringRTX 20 Seriessm_75INT4
AmpereRTX 30 Seriessm_80, sm_86INT4, FP4
AdaRTX 40 Seriessm_89INT4, FP4
BlackwellRTX 50 Seriessm_120a, sm_121aINT4, FP4, NVFP4

Differences Between Precision Modes

  • INT4: Available on all NVIDIA GPUs with compute capability ≥ 7.5 (Turing and later). The default precision for RTX 20 Series.
  • FP4: Requires compute capability ≥ 8.0 (Ampere and later). Equivalent performance to INT4 with higher quality.
  • NVFP4: Available on Blackwell architecture (such as RTX 5090). Achieves approximately 3x speedup over BF16 with superior image quality.

How to Install Nunchaku in Forge-Neo

Installing Nunchaku in Forge-Neo is very simple — just add --nunchaku to the set COMMANDLINE_ARGS= line in the startup file webui-user.bat. Nunchaku will be installed automatically the next time you launch it.

--nunchaku
Nunchaku installation log
The installation log will appear when the program starts up after the command has been added.

How to Use Qwen-Image-Edit in Forge-Neo

First, let’s try using Qwen-Image-Edit without Nunchaku. As in my previous article, I’ll be using “Qwen-Image-Edit-2511” this time as well.

Downloading the Models for Qwen-Image-Edit

I’ll use the same models as with ComfyUI. The directory structure is basically the same as usual. Place the Diffusion Model in \sd-webui-forge-neo\models\Stable-diffusion.

Diffusion Model Text Encoders VAE LoRA (Optional – 4-step Lightning acceleration)

About Input Images

There are no specific requirements for the input image, but for comparison with ComfyUI, I’ll use the same images used in the ComfyUI workflow sample from my previous article. You can download them from the drive below.

How to Use the Qwen-Image-Edit UI

The basic steps for using Qwen-Image-Edit in Forge-Neo are as follows.

  • Switching the UI Preset
  • Specifying the Checkpoint
  • Specifying the VAE / Text Encoder
  • Switching Diffusion in Low Bits
  • Switching the Mode
  • Specifying the Input Image
  • Entering the Prompt
  • Specifying the Sampling / Scheduler
  • Specifying the Image Size
  • Entering Other Parameters

Now let’s go through each one.

Qwen-Image-Edit UI and Preset Checkpoint 

1. Switching the UI Preset

Switch the “UI Preset” at the top of the screen to qwen.

2. Specifying the Checkpoint

For “Checkpoint”, specify qwen_image_edit_2511_bf16.safetensors.

Qwen-Image-Edit VAE and Text Encoder 

3. Specifying the VAE / Text Encoder

For “VAE / Text Encoder”, specify qwen_image_vae.safetensors and qwen_2.5_vl_7b_fp8_scaled.safetensors.

4. Switching Diffusion in Low Bits

Set the “Diffusion in Low Bits” in the upper right of the UI based on the following criteria.

  • When using LoRA: Automatic (fp16 LoRA)
  • When not using LoRA: Automatic

Since I’ll be using “Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors” this time, I’ll use Automatic (fp16 LoRA).

Qwen-Image-Edit Settings — Mode & Prompt 

5. Switching the Mode

Since Qwen-Image-Edit is img2img, switch the mode from txt2img to img2img.

6. Specifying the Input Image

Load it into img2img in the Generation tab as input image 1.

For input images 2 and 3, check “ImageStitch Integrated” and load the images there. ✅ I’m only using one image this time, but Qwen-Image-Edit can accept up to 3 input images, so you can enter 2 input images here. If you want to add a second image, load it into the upload area at the bottom left and click the “Append Pasted Image” button to add it.

Example of input image 1 

7. Entering the Prompt

I’ll use the same prompt as with ComfyUI.

change the furniture leather difference in image 1 to the fur material in image 2
<lora:Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16:1>
Image after entering parameters 

8. Specifying the Sampling / Scheduler

By default, the “qwen” preset in Forge-Neo has Sampling Method and Schedule Type set to LCM / Normal, but I’ll change these to Euler / Simple for comparison with ComfyUI.

9. Specifying the Image Size

Since I’ll use the image at its original size, switch the image size from Resize to to Resize by and set Scale to 1.

10. Entering Other Parameters

Since I’m using “Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors”, set CFG Scale to 1.

The default value of Denoising Strength is 0.95. Change this to 1.

With the above settings, generate by clicking the “Generate” button. The results are as follows. ⚠️ The first generation takes some time to prepare, so the results below are from the second generation after the model has finished loading.

Results of using Qwen-Image-Edit in Forge-Neo 

Getting Started with Nunchaku Qwen-Image-Edit

At the time of writing, the official Qwen-Image-Edit only went up to Qwen-Image-Edit-2509.

QuantFunc has published Qwen-Image-Edit-2511, so let’s download the model suited to your environment from here. Place it in the same location as standard models: \sd-webui-forge-neo\models\Stable-diffusion.

Also, I won’t be using it this time, but the GGUF version is listed below.

About Nunchaku Models

Nunchaku models come in the following types. Use the model suited to your environment. ✅ The official version only has two ranks: 32 and 128.

Data TypeRankDescription
INT4 (RTX 40 and 30 Series GPUs)32Fast, low quality
128Balanced, medium quality
256Slow, high quality
NVFP4 (RTX 50 Series GPUs)32Fast, low quality
128Balanced, medium quality
256Slow, high quality

I’ll use the balanced type (equivalent to the official high-quality rank): nunchaku_qwen_image_edit_2511_balance_int4.safetensors. ✅ If you’re using an RTX 50 Series GPU, download nunchaku_qwen_image_edit_2511_balance_fp4.safetensors.

How to Generate Images with Nunchaku Qwen-Image-Edit

Generate using the same method as the standard version. I’ve also used the same settings for comparison this time. The generation results are as follows.

Results of using Qwen-Image-Edit in Forge-Neo (Nunchaku) 

Forge-Neo vs ComfyUI: Qwen-Image-Edit 2511 Speed and Quality Comparison

Now let’s look at the generation results from my environment (RTX 3090) alongside ComfyUI. ✅ I couldn’t install Nunchaku for ComfyUI due to a bug, so I’m comparing Sage Attention instead.

Forge-Neo - Standard Version Results
Forge-Neo - Standard Edition results: Generation time 34 seconds
Forge-Neo - Nunchaku Results
Forge-Neo - Nunchaku Edition results: Generation time 19 seconds
ComfyUI - Standard Version Results
ComfyUI - Standard Edition Results: Generation time: 32 seconds
ComfyUI - SageAttention Results
ComfyUI - Sageattention Edition Results: Generation time: seconds

The Forge-Neo results lean toward warmer color temperatures. On the other hand, the ComfyUI results have a slight green color cast. I also think that ComfyUI reproduces the fur texture better.

In terms of generation time, both are fast, but the Forge-Neo Nunchaku version produced faster results.

How to Improve Qwen-Image-Edit Quality in Forge-Neo

The following is paid content, but it covers techniques for improving Qwen-Image-Edit quality and known issues (unresolved) when using Qwen-Image-Edit.

🔒This content is limited to paid supporters. Paid supporters can view it after logging in.

Summary

In this article, I explained how to use Qwen-Image-Edit-2511 with Nunchaku (SVDQuant) in Forge-Neo.

  • Nunchaku is a 4-bit quantization engine using SVDQuant that enables faster inference than GGUF while reducing VRAM usage.
  • The Forge-Neo + Nunchaku version generates in 19 seconds, significantly faster than the standard version (34 seconds) and ComfyUI standard version (32 seconds).
  • While the quality is slightly inferior to ComfyUI, it is more than practical for users who are familiar with WebUI.

There are still some issues, but I’m looking forward to improvements in future updates.

Thank you for reading to the end.

If you found this even a little helpful, please support by giving it a “Like”!