Qwen3.5-35B-A3B-Q4_K_M.gguf
./build/bin/llama-bench -m /home/hoge/models/Qwen3.5-35B-A3B-Q4_K_M.gguf
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| qwen35moe ?B Q4_K - Medium | 19.74 GiB | 34.66 B | CUDA | 99 | pp512 | 4995.94 ± 28.46 |
| qwen35moe ?B Q4_K - Medium | 19.74 GiB | 34.66 B | CUDA | 99 | tg128 | 147.73 ± 1.29 |
Qwen3.5-27B-Q4_K_M.gguf
./build/bin/llama-bench -m /home/hoge/models/Qwen3.5-27B-Q4_K_M.gguf
| model | size | params | backend | ngl | test | t/s |
|---|---|---|---|---|---|---|
| qwen35 ?B Q4_K - Medium | 15.58 GiB | 26.90 B | CUDA | 99 | pp512 | 2473.79 ± 276.06 |
| qwen35 ?B Q4_K - Medium | 15.58 GiB | 26.90 B | CUDA | 99 | tg128 | 40.49 ± 2.86 |