はじめに
こちらで構築したOllamaのベンチマークを取ります。touch-sp.hatenablog.com
ベンチマークをとるプログラムは以前にも書きました。
touch-sp.hatenablog.com
書き換えたプログラム
from ollama import Client client = Client( host="http://192.168.11.18:11434" ) def client_chat(): response = client.chat( #model="falcon3", model="opencoder", #model="phi3:14b-medium-4k-instruct-q4_K_M", messages=[ { "role": "user", "content": "how to make gui with 3 buttons in pyside6" }, ], ) return response.eval_count, response.eval_duration if __name__ == "__main__": total_tokens = 0 total_time = 0 for _ in range(3): eval_count, eval_duration = client_chat() total_tokens += eval_count total_time += eval_duration rate = (total_tokens / total_time) * 10**9 print(f"tokens per second: {rate:.2f} tokens/second")
結果
falcon3
tokens per second: 34.10 tokens/second
opencoder
tokens per second: 49.39 tokens/second
phi3.5
tokens per second: 82.17 tokens/second
phi3:14b-medium-4k-instruct-q4_K_M
tokens per second: 28.37 tokens/second
yi-coder:9b
tokens per second: 56.17 tokens/second
qwen2.5-coder:14b
tokens per second: 26.94 tokens/second