https://touch-sp.hatenablog.com/entry/2026/02/20/172140

はじめに

NVIDIAが日本語に対応した9BパラメータのLLM「NVIDIA-Nemotron-Nano-9B-v2-Japanese」を公開していたので試してみました。

Mamba SSMアーキテクチャをベースにしており、Thinkingモード（enable_thinking=True）にも対応しています。

PC環境

Ubuntu 25.10 on WSL2

Python環境構築

uvを使っています。pyproject.tomlを載せておきます。

causal_conv1d と mamba_ssm はGitHubのリリースページから事前ビルドのwhlファイルを取得しています。uv sync で環境構築できます。

[project]
name = "nemotron"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = "==3.13.*"
dependencies = [
    "accelerate==1.12.0",
    "causal_conv1d",
    "hf-xet==1.2.0",
    "mamba_ssm",
    "torch==2.7.1+cu128",
    "transformers==4.48.3",
    "triton==3.3.1"
]

[[tool.uv.index]]
name = "torch-cuda"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

[tool.uv.sources]
torch = [{ index = "torch-cuda" }]
causal_conv1d = { url = "https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.6.0/causal_conv1d-1.6.0+cu12torch2.7cxx11abiTRUE-cp313-cp313-linux_x86_64.whl" }
mamba_ssm = { url = "https://github.com/state-spaces/mamba/releases/download/v2.3.0/mamba_ssm-2.3.0+cu12torch2.7cxx11abiTRUE-cp313-cp313-linux_x86_64.whl" }

Pythonスクリプト

実行には以下が必要です。

sudo apt install build-essential
sudo apt install python3.13-dev

enable_thinking=True でThinkingモードを有効にしています。

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    "nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "nvidia/NVIDIA-Nemotron-Nano-9B-v2-Japanese",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a haiku about GPUs"},
]

tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=True,
    return_tensors="pt"
).to(model.device)

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=128,
    eos_token_id=tokenizer.eos_token_id
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ランキング参加中

プログラミング