PC環境
Windows 11 RTX 4090 (VRAM 24GB) CUDA 12.4 Python 3.12
Python環境構築
pip install torch==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124 pip install git+https://github.com/huggingface/diffusers pip install transformers accelerate imageio imageio-ffmpeg
Pythonスクリプト
import torch from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel from diffusers.utils import export_to_video model_id = "tencent/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16, revision="refs/pr/18" ) pipe = HunyuanVideoPipeline.from_pretrained( model_id, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18" ) pipe.vae.enable_tiling() pipe.to("cuda") output = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, ).frames[0] export_to_video(output, "output.mp4", fps=15)
結果
VRAM使用量が24GBを超えていたため動画作成に一晩かかりました。作成した動画は以下のGoogle Bloggerに載せています。support-touchsp.blogspot.com
追記(2025年1月12日)
VRAM使用量を24GB未満に抑える方法がありました。Pythonスクリプト
import torch from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel from diffusers.utils import export_to_video from decorator import gpu_monitor, time_monitor @gpu_monitor(interval=0.5) @time_monitor def main(): model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16) # Enable memory savings pipe.vae.enable_tiling() pipe.enable_sequential_cpu_offload() output = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, ).frames[0] export_to_video(output, "output.mp4", fps=15) print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") if __name__ == "__main__": main()
結果
torch.cuda.max_memory_allocated: 3.62 GB time: 236.62 sec GPU 0 - Used memory: 4.40/23.99 GB
追記(2025年1月25日)
ParaAttentionというのを用いると生成速度を高めることができるようです。ただし、多少の画質劣化はあるようです。
Pythonスクリプト
import torch from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel from diffusers.utils import export_to_video from decorator import gpu_monitor, time_monitor @gpu_monitor(interval=0.5) @time_monitor def main(): model_id = "hunyuanvideo-community/HunyuanVideo" transformer = HunyuanVideoTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16) # Enable memory savings pipe.to("cuda") from para_attn.first_block_cache.diffusers_adapters import apply_cache_on_pipe apply_cache_on_pipe(pipe, residual_diff_threshold=0.06) pipe.enable_sequential_cpu_offload() pipe.vae.enable_tiling() output = pipe( prompt="A cat walks on the grass, realistic", height=320, width=512, num_frames=61, num_inference_steps=30, ).frames[0] export_to_video(output, "output.mp4", fps=15) print(f"torch.cuda.max_memory_allocated: {torch.cuda.max_memory_allocated()/ 1024**3:.2f} GB") if __name__ == "__main__": main()
結果
torch.cuda.max_memory_allocated: 38.57 GB time: 167.47 sec GPU 0 - Used memory: 23.86/23.99 GB
ParaAttentionで1分以上の短縮ができています。