https://touch-sp.hatenablog.com/entry/2025/11/26/233752

はじめに

Black Forest LabsからFLUX.2-devが公開されました。

FLUX.1-devとFLUX.1 Kontext-devを組み合わせたようなモデルになっています。

結果

FLUX.1 Kontext-devと比較してみました。

左が元画像、右が作成画像です。

今回の結果

すこし手が大きすぎるような気がします。

FLUX.1 Kontext-devの結果

こちらは腕の長さに違和感があります。こちらの記事を見て下さい。

Pythonスクリプト

import torch
from diffusers import Flux2Pipeline
from diffusers.utils import load_image

repo_id = "diffusers/FLUX.2-dev-bnb-4bit" #quantized text-encoder and DiT. VAE still in bf16
device = "cuda"
torch_dtype = torch.bfloat16

pipe = Flux2Pipeline.from_pretrained(
    repo_id,
    torch_dtype=torch_dtype
)

pipe.enable_model_cpu_offload()

prompt = "Make the lady hold a sign that says 'FLUX.2 dev is awesome'"
image = load_image("girl.jpg")
image = pipe(
    prompt=prompt,
    image=[image], #multi-image input
    generator=torch.Generator(device=device).manual_seed(42),
    num_inference_steps=50,
    guidance_scale=4,
).images[0]

image.save("flux2_output.png")

環境構築

pyproject.tomlを載せておきます。

uvを使うとuv syncだけで環境構築できると思います。

[project]
name = "flux"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "accelerate==1.12.0",
    "bitsandbytes==0.48.2",
    "diffusers @ git+https://github.com/huggingface/diffusers@c8656ed73c638e51fc2e777a5fd355d69fa5220f",
    "hf-xet==1.2.0",
    "torch==2.9.1+cu126",
    "torchvision==0.24.1+cu126",
    "transformers==4.57.3",
]

[[tool.uv.index]]
name = "torch-cuda"
url = "https://download.pytorch.org/whl/cu126"
explicit = true

[tool.uv.sources]
torch = [{ index = "torch-cuda" }]
torchvision = [{ index = "torch-cuda" }]

補足

Flash Attentionの効果を試した結果を載せておきます。 RTX 4090で実験しています。

Flash Attentionはこちらを参考にビルドしました。

Flash Attentionあり

model load: 14.84 sec
flash attention: 0.00 sec
image generation: 246.33 sec
total time: 261.17 sec
GPU 0 - Used memory: 23.86/23.99 GB

Flash Attentionなし

model load: 14.94 sec
image generation: 268.20 sec
total time: 283.15 sec
GPU 0 - Used memory: 23.87/23.99 GB

Pythonコード

import torch
from diffusers import Flux2Pipeline
from diffusers.utils import load_image

import time
from decorator import gpu_monitor

@gpu_monitor(interval=0.5)
def main():
    start_time = time.time()

    repo_id = "diffusers/FLUX.2-dev-bnb-4bit" #quantized text-encoder and DiT. VAE still in bf16
    device = "cuda"
    torch_dtype = torch.bfloat16

    pipe = Flux2Pipeline.from_pretrained(
        repo_id,
        torch_dtype=torch_dtype
    )

    model_load_time = time.time()

    pipe.transformer.set_attention_backend("flash")
    
    flash_attention_time = time.time()

    pipe.enable_model_cpu_offload()

    prompt = "Make the lady hold a sign that says 'FLUX.2 dev is awesome'"
    image = load_image("girl.jpg")
    image = pipe(
        prompt=prompt,
        image=[image], #multi-image input
        generator=torch.Generator(device=device).manual_seed(42),
        num_inference_steps=50,
        guidance_scale=4,
    ).images[0]

    image.save("flux2_output.png")

    end_time = time.time()

    print(f"model load: {(model_load_time - start_time):.2f} sec")
    print(f"flash attention: {(flash_attention_time - model_load_time):.2f} sec")
    print(f"image generation: {(end_time - flash_attention_time):.2f} sec")
    print(f"total time: {(end_time - start_time):.2f} sec")

if __name__ == "__main__":
    main()

ランキング参加中

プログラミング