結果
左が元画像、右が作成画像です。
今回の結果


FLUX.1 Kontext [dev]の結果


こちらの記事を見て下さい。
Pythonスクリプト
import torch from diffusers import QwenImageEditPipeline from diffusers.utils import load_image from diffusers.quantizers import PipelineQuantizationConfig from decorator import gpu_monitor, time_monitor @time_monitor @gpu_monitor(interval=0.5) def main(): pipeline_quant_config = PipelineQuantizationConfig( quant_backend="bitsandbytes_4bit", quant_kwargs={ "load_in_4bit": True, "bnb_4bit_quant_type": "nf4", "bnb_4bit_compute_dtype": torch.bfloat16, "llm_int8_skip_modules": ["transformer_blocks.0.img_mod"] }, components_to_quantize=["text_encoder", "transformer"] ) pipe = QwenImageEditPipeline.from_pretrained( "Qwen/Qwen-Image-Edit", quantization_config=pipeline_quant_config, torch_dtype=torch.bfloat16 ) pipe.enable_model_cpu_offload() image = load_image("girl.jpg").convert("RGB") prompt = "Make the lady hold a sign that says 'Qwen Image Edit is awesome'" image = pipe( image, prompt, num_inference_steps=50 ).images[0] image.save("qwenimage_edit.png") if __name__=="__main__": main()
VRAM使用量と時間の計測はこちらのスクリプトを使いました。
RTX 4090を使っています。
GPU 0 - Used memory: 18.68/23.99 GB time: 144.84 sec
環境構築
pyproject.tomlを載せておきます。
uvを使うとuv syncだけで環境構築できると思います。
[project]
name = "qwen-image-edit"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
"accelerate>=1.10.1",
"bitsandbytes>=0.47.0",
"diffusers>=0.35.1",
"nvidia-ml-py>=13.580.65",
"torch==2.8.0+cu126",
"torchvision==0.23.0+cu126",
"transformers>=4.56.1",
]
[[tool.uv.index]]
name = "torch-cuda"
url = "https://download.pytorch.org/whl/cu126"
explicit = true
[tool.uv.sources]
torch = [{ index = "torch-cuda" }]
torchvision = [{ index = "torch-cuda" }]