https://touch-sp.hatenablog.com/entry/2025/06/14/212654

環境構築

pip install torch==2.7.1+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate
pip install sentencepiece

accelerate==1.7.0
diffusers @ git+https://github.com/huggingface/diffusers@8adc6003ba4dbf5b61bb4f1ce571e9e55e145a99
sentencepiece==0.2.0
torch==2.7.1+cu126
transformers==4.52.4

結果

Pythonスクリプト

モデルはこちらからダウンロードしました。

import torch
from diffusers import ChromaTransformer2DModel, ChromaPipeline
from transformers import T5EncoderModel, T5Tokenizer

bfl_repo = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16

transformer = ChromaTransformer2DModel.from_single_file(
    "chroma-unlocked-v36-detail-calibrated.safetensors",
    torch_dtype=dtype
)

text_encoder = T5EncoderModel.from_pretrained(
    bfl_repo, subfolder="text_encoder_2",
    torch_dtype=dtype
)

tokenizer = T5Tokenizer.from_pretrained(
    bfl_repo, subfolder="tokenizer_2",
    torch_dtype=dtype
)

pipe = ChromaPipeline.from_pretrained(
    bfl_repo,
    transformer=transformer,
    text_encoder=text_encoder,
    tokenizer=tokenizer,
    torch_dtype=dtype
)

pipe.to("cuda")
pipe.enable_model_cpu_offload()

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=4.0,
    num_inference_steps=26,
    generator=torch.manual_seed(1234)
).images[0]

image.save("image.jpg")

結果２

プロンプトを詳細に書けば複雑な画像も作成できます。

プロンプト

Ultra-realistic, high-quality photo of an anthropomorphic capybara with a tough, streetwise attitude, wearing a worn black leather jacket, dark sunglasses, and ripped jeans. The capybara is leaning casually against a gritty urban wall covered in vibrant graffiti. Behind it, in bold, dripping yellow spray paint, the word "HuggingFace" is scrawled in large street-art style letters. The scene is set in a dimly lit alleyway with moody lighting, scattered trash, and an edgy, rebellious vibe — like a character straight out of an underground comic book.

DeepLでの翻訳

着古した黒いレザージャケット、ダークなサングラス、破れたジーンズを身につけた、タフでストリートライクな態度の擬人化されたカピバラの超リアルで高品質な写真。カピバラは、鮮やかな落書きで覆われた都会の壁にさりげなく寄りかかっている。その背後には、垂れるような黄色のスプレーペイントで、ストリートアート風の大きな文字で「HuggingFace」と書かれている。シーンは薄暗い路地にあり、ムーディーな照明、散乱するゴミ、エッジの効いた反抗的な雰囲気--まるでアンダーグラウンド・コミックから飛び出してきたようなキャラクターだ。

ネガティブプロンプト

low quality, bad anatomy, extra digits, missing digits, extra limbs, missing limbs

ランキング参加中

プログラミング