https://touch-sp.hatenablog.com/entry/2024/12/17/103456

PC環境

Windows 11
RTX 4090 (VRAM 24GB)
CUDA 12.4
Python 3.12

Python環境構築

pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate imageio imageio-ffmpeg

Pythonスクリプト

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video

model_id = "tencent/HunyuanVideo"
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    torch_dtype=torch.bfloat16,
    revision="refs/pr/18"
)
pipe = HunyuanVideoPipeline.from_pretrained(
    model_id, transformer=transformer,
    torch_dtype=torch.float16,
    revision="refs/pr/18"
)
pipe.vae.enable_tiling()
pipe.to("cuda")

output = pipe(
    prompt="A cat walks on the grass, realistic",
    height=320,
    width=512,
    num_frames=61,
    num_inference_steps=30,
).frames[0]
export_to_video(output, "output.mp4", fps=15)

結果

VRAM使用量が24GBを超えていたため動画作成に一晩かかりました。

作成した動画は以下のGoogle Bloggerに載せています。
support-touchsp.blogspot.com

追記（2025年1月12日）

VRAM使用量を24GB未満に抑える方法がありました。

Pythonスクリプト

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    model_id = "hunyuanvideo-community/HunyuanVideo"
    transformer = HunyuanVideoTransformer3DModel.from_pretrained(
        model_id, subfolder="transformer", torch_dtype=torch.bfloat16
    )
    pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)

    # Enable memory savings
    pipe.vae.enable_tiling()
    pipe.enable_sequential_cpu_offload()

    output = pipe(
        prompt="A cat walks on the grass, realistic",
        height=320,
        width=512,
        num_frames=61,
        num_inference_steps=30,
    ).frames[0]
    export_to_video(output, "output.mp4", fps=15)

    print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB")

if __name__ == "__main__":
    main()

結果

torch.cuda.max_memory_allocated: 3.62 GB
time: 236.62 sec
GPU 0 - Used memory: 4.40/23.99 GB

追記（2025年1月25日）

ParaAttentionというのを用いると生成速度を高めることができるようです。
ただし、多少の画質劣化はあるようです。

Pythonスクリプト

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
from diffusers.utils import export_to_video
from decorator import gpu_monitor, time_monitor

@gpu_monitor(interval=0.5)
@time_monitor
def main():
    model_id = "hunyuanvideo-community/HunyuanVideo"
    transformer = HunyuanVideoTransformer3DModel.from_pretrained(
        model_id, subfolder="transformer", torch_dtype=torch.bfloat16
    )
    pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16)

    # Enable memory savings
    pipe.to("cuda")
    from para_attn.first_block_cache.diffusers_adapters import apply_cache_on_pipe
    apply_cache_on_pipe(pipe, residual_diff_threshold=0.06)
    
    pipe.enable_sequential_cpu_offload()
    pipe.vae.enable_tiling()

    output = pipe(
        prompt="A cat walks on the grass, realistic",
        height=320,
        width=512,
        num_frames=61,
        num_inference_steps=30,
    ).frames[0]
    export_to_video(output, "output.mp4", fps=15)

    print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB")

if __name__ == "__main__":
    main()

結果

torch.cuda.max_memory_allocated: 38.57 GB
time: 167.47 sec
GPU 0 - Used memory: 23.86/23.99 GB

ParaAttentionで1分以上の短縮ができています。

ランキング参加中

プログラミング