https://touch-sp.hatenablog.com/entry/2025/03/02/102400

はじめに

OmniGenはいろいろなことができるモデルです。

前回の記事はこちらです。
touch-sp.hatenablog.com
今回は姿勢を維持して新たな画像を生成してみます。
ControlNetに近いですがポーズ画像を生成することなく直接あらたな画像を作成できます。

元画像

結果

手の描写がおかしいです。そのへんが弱点かもしれません。

FLUX.1-devでControlNetを使った場合の結果も比較のため載せておきます。

Pythonスクリプト

元画像の作成

元画像は「FLUX.1-dev」と「anzu-flux v2.2」というLoRAを使って作成しました。
一部「FLUX.1-Fill-dev」で修正を加えています。

import torch 
from diffusers import FluxPipeline
import gc

def flush():
    gc.collect()
    torch.cuda.empty_cache()

model_id = "black-forest-labs/FLUX.1-dev"

prompt="Realistic photo. A young woman sits on a sofa, holding a book and facing the camera. She wears delicate silver hoop earrings adorned with tiny, sparkling diamonds that catch the light, with her long chestnut hair cascading over her shoulders. Her eyes are focused and gentle, framed by long, dark lashes. She is dressed in a cozy cream sweater, which complements her warm, inviting smile. Behind her, there is a table with a cup of water in a sleek, minimalist blue mug. The background is a serene indoor setting with soft natural light filtering through a window, adorned with tasteful art and flowers, creating a cozy and peaceful ambiance. 4K, HD."

pipeline = FluxPipeline.from_pretrained(
        model_id,
        transformer=None,
        vae=None
).to("cuda")

with torch.no_grad():
    prompt_embeds, pooled_prompt_embeds, text_ids = pipeline.encode_prompt(
        prompt=prompt,
        prompt_2=None,
    )

del pipeline
flush()

pipeline = FluxPipeline.from_pretrained(
    model_id,
    text_encoder=None,
    text_encoder_2=None,
    tokenizer=None,
    tokenizer_2=None,
    torch_dtype=torch.bfloat16
)

pipeline.load_lora_weights("anzu-flux-LoRA_v22.safetensors")

pipeline.enable_sequential_cpu_offload()

seed = 20250228
generator = torch.Generator().manual_seed(seed)
image = pipeline(
    prompt_embeds=prompt_embeds.bfloat16(),
    pooled_prompt_embeds=pooled_prompt_embeds.bfloat16(),
    width=1024,
    height=1024,
    num_inference_steps=27,
    generator=generator,
    guidance_scale=3.5,
    joint_attention_kwargs={"scale": 1.0},
    
).images[0]

image.save(f"lora_result_seed{seed}.jpg")

OmniGenで画像生成

import torch
from diffusers import OmniGenPipeline
from diffusers.utils import load_image 

pipe = OmniGenPipeline.from_pretrained(
    "Shitao/OmniGen-v1-diffusers",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

prompt="Following the pose of this image <img><|image_1|></img>, generate a new photo: A young boy is sitting on a sofa in the library, holding a book. His hair is neatly combed, and a faint smile plays on his lips, with a few freckles scattered across his cheeks. The library is quiet, with rows of shelves filled with books stretching out behind him."
input_images=[load_image("lora_result_seed20250228.jpg")]

seed = 20250301
for i in range(3):
    new_seed = seed + 12345 * i
    generator = torch.manual_seed(new_seed)
    image = pipe(
        prompt=prompt, 
        input_images=input_images, 
        guidance_scale=2, 
        img_guidance_scale=1.6,
        use_input_image_size_as_output=True,
        generator=generator
    ).images[0]
    image.save(f"omnigen_result_seed{new_seed}.jpg")

FLUX.1-devで画像生成

いったんControlNet用のポーズ画像を作る必要があります。OmniGenを使いました。

import torch
from diffusers import OmniGenPipeline
from diffusers.utils import load_image 

pipe = OmniGenPipeline.from_pretrained(
    "Shitao/OmniGen-v1-diffusers",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

prompt="Detect the skeleton of human in this image: <img><|image_1|></img>"
input_images=[load_image("lora_result_seed20250228.jpg")]

seed = 20250301
generator = torch.manual_seed(seed)
image1 = pipe(
    prompt=prompt, 
    input_images=input_images, 
    guidance_scale=2, 
    img_guidance_scale=1.6,
    use_input_image_size_as_output=True,
    generator=generator
).images[0]
image1.save("pose.png")

その画像をもとに画像生成しました。

import torch
from diffusers.utils import load_image
from diffusers import FluxControlNetPipeline, FluxControlNetModel

base_model = 'black-forest-labs/FLUX.1-dev'
controlnet_model = 'InstantX/FLUX.1-dev-Controlnet-Union'

controlnet = FluxControlNetModel.from_pretrained(
    controlnet_model,
    torch_dtype=torch.bfloat16
)

pipe = FluxControlNetPipeline.from_pretrained(
    base_model,
    controlnet=controlnet,
    torch_dtype=torch.bfloat16
)

pipe.enable_sequential_cpu_offload()

control_image_depth = load_image("pose.png")
controlnet_conditioning_scale = 0.5
control_mode = 4

width, height = control_image_depth.size

prompt = " A young boy is sitting on a sofa in the library, holding a book. His hair is neatly combed, and a faint smile plays on his lips, with a few freckles scattered across his cheeks. The library is quiet, with rows of shelves filled with books stretching out behind him."

seed = 20250301
for i in range(3):
    new_seed = seed + 12345 * i
    generator = torch.manual_seed(new_seed)
    image = pipe(
        prompt, 
        control_image=control_image_depth,
        control_mode=control_mode,
        width=width,
        height=height,
        controlnet_conditioning_scale=controlnet_conditioning_scale,
        num_inference_steps=24, 
        guidance_scale=3.5,
        generator=generator
    ).images[0]
    image.save(f"flux_controlnet_result_seed{new_seed}.jpg")

ランキング参加中

プログラミング