Loose-Info.com
Last Update 2026/02/17
TOP - 各種テスト - LLM - ローカルLLMの実測値比較 Llama 3.2 [日本語プロンプト]

低スペック寄りのPCでローカルLLMを動作させた際の記録です。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。

検証用PC

OS

Debian GNU/Linux 12 (bookworm)

CPU

Intel(R) Core(TM) i5-14400F

GPU

GeForce RTX 3060 12GB

メモリ

DDR4 PC4-25600 32GB × 4

SSD

crucial P310 CT1000P310SSD8-JP


構築環境 : Docker + Ollama (特別な設定などは無い状態)

検証用プロンプト

おすすめの日本の絶景を教えてください。東西南北、10箇所程度、日本語で。

Llama 3.2 [日本語プロンプト]

GPU無し
1b-instruct-q4_K_M(45.9TPS)   1b-instruct-q5_K_M(41.4TPS)   1b-instruct-q8_0(31.1TPS)   1b-instruct-fp16(16.8TPS)
3b-instruct-q4_K_M(18.4TPS)   3b-instruct-q5_K_M(16.2TPS)   3b-instruct-q8_0(12.0TPS)   3b-instruct-fp16(6.56TPS)  
GPU使用
1b-instruct-q4_K_M(304TPS)   1b-instruct-q5_K_M(284TPS)   1b-instruct-q8_0(216TPS)   1b-instruct-fp16(124TPS)
3b-instruct-q4_K_M(129TPS)   3b-instruct-q5_K_M(117TPS)   3b-instruct-q8_0(85.8TPS)   3b-instruct-fp16(49.4TPS)  

・TPS(tokens/s) は eval_count / eval_duration により算出
・モデルロード済みの検証は省略

llama3.2:1b-instruct-q4_K_M(GPU無し)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q4_K_M 2026-02-16 total_duration(合計時間) : 11501001926 (11.501s) load_duration(モデルのロード時間) : 812907296 ( 0.813s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 263865232 ( 0.264s) eval_count(生成トークン数) : 466 eval_duration(生成時間) : 10149972037 (10.150s) real 0m11.511s user 0m0.015s sys 0m0.014s メモリ使用量(RSS) : 1036208 KB

llama3.2:1b-instruct-q5_K_M(GPU無し)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q5_K_M 2026-02-16 total_duration(合計時間) : 10939658117 (10.940s) load_duration(モデルのロード時間) : 793890321 ( 0.794s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 338222976 ( 0.338s) eval_count(生成トークン数) : 396 eval_duration(生成時間) : 9567252255 ( 9.567s) real 0m10.950s user 0m0.024s sys 0m0.006s メモリ使用量(RSS) : 1142536 KB

llama3.2:1b-instruct-q8_0(GPU無し)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q8_0 2026-02-16 total_duration(合計時間) : 7657952525 (7.658s) load_duration(モデルのロード時間) : 793598110 (0.794s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 206035917 (0.206s) eval_count(生成トークン数) : 203 eval_duration(生成時間) : 6533717406 (6.534s) real 0m7.668s user 0m0.024s sys 0m0.005s メモリ使用量(RSS) : 1535096 KB

llama3.2:1b-instruct-fp16(GPU無し)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization F16 2026-02-16 total_duration(合計時間) : 33328163716 (33.328s) load_duration(モデルのロード時間) : 1063475193 ( 1.063s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 318299358 ( 0.318s) eval_count(生成トークン数) : 530 eval_duration(生成時間) : 31589739060 (31.590s) real 0m33.339s user 0m0.022s sys 0m0.011s メモリ使用量(RSS) : 2665388 KB

llama3.2:3b-instruct-q4_K_M(GPU無し)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q4_K_M 2026-02-16 total_duration(合計時間) : 41005752242 (41.006s) load_duration(モデルのロード時間) : 1074311413 ( 1.074s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 714585092 ( 0.715s) eval_count(生成トークン数) : 712 eval_duration(生成時間) : 38783315803 (38.783s) real 0m41.012s user 0m0.027s sys 0m0.000s メモリ使用量(RSS) : 2543904 KB

llama3.2:3b-instruct-q5_K_M(GPU無し)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q5_K_M 2026-02-16 total_duration(合計時間) : 43281281383 (43.281s) load_duration(モデルのロード時間) : 1048326137 ( 1.048s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 956229819 ( 0.956s) eval_count(生成トークン数) : 662 eval_duration(生成時間) : 40881022622 (40.881s) real 0m43.292s user 0m0.020s sys 0m0.014s メモリ使用量(RSS) : 2838348 KB

llama3.2:3b-instruct-q8_0(GPU無し)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q8_0 2026-02-16 total_duration(合計時間) : 33119246141 (33.119s) load_duration(モデルのロード時間) : 1292674681 ( 1.293s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 520029242 ( 0.520s) eval_count(生成トークン数) : 373 eval_duration(生成時間) : 31083260384 (31.083s) real 0m33.130s user 0m0.023s sys 0m0.010s メモリ使用量(RSS) : 3910840 KB

llama3.2:3b-instruct-fp16(GPU無し)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization F16 2026-02-16 total_duration(合計時間) : 68190815829 (68.191s) load_duration(モデルのロード時間) : 1544711966 ( 1.545s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 892044917 ( 0.892s) eval_count(生成トークン数) : 430 eval_duration(生成時間) : 65499135898 (65.499s) real 1m8.202s user 0m0.022s sys 0m0.016s メモリ使用量(RSS) : 6853872 KB

llama3.2:1b-instruct-q4_K_M(GPU使用)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q4_K_M 2026-02-16 total_duration(合計時間) : 3105868158 (3.106s) load_duration(モデルのロード時間) : 915587971 (0.156s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 13159742 (0.013s) eval_count(生成トークン数) : 565 eval_duration(生成時間) : 1856111989 (1.856s) real 0m3.117s user 0m0.020s sys 0m0.010s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 43C P2 169W / 170W | 1611MiB / 12288MiB | 84% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 45842 C /usr/bin/ollama 1326MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 640244 KB

llama3.2:1b-instruct-q5_K_M(GPU使用)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q5_K_M 2026-02-16 total_duration(合計時間) : 3031084870 (3.031s) load_duration(モデルのロード時間) : 894848615 (0.895s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 13127833 (0.013s) eval_count(生成トークン数) : 517 eval_duration(生成時間) : 1819582094 (1.820s) real 0m3.042s user 0m0.025s sys 0m0.005s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 46C P2 169W / 170W | 1709MiB / 12288MiB | 85% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 45908 C /usr/bin/ollama 1424MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 641380 KB

llama3.2:1b-instruct-q8_0(GPU使用)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization Q8_0 2026-02-16 total_duration(合計時間) : 3409915401 (3.410s) load_duration(モデルのロード時間) : 885666273 (0.886s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 11576154 (0.012s) eval_count(生成トークン数) : 485 eval_duration(生成時間) : 2244038033 (2.244s) real 0m3.421s user 0m0.026s sys 0m0.009s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 45C P2 152W / 170W | 2101MiB / 12288MiB | 90% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 45977 C /usr/bin/ollama 1816MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 694660 KB

llama3.2:1b-instruct-fp16(GPU使用)

Model architecture llama parameters 1.2B context length 131072 embedding length 2048 quantization F16 2026-02-16 total_duration(合計時間) : 5847412488 (5.847s) load_duration(モデルのロード時間) : 1144955109 (1.145s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 25784321 (0.026s) eval_count(生成トークン数) : 541 eval_duration(生成時間) : 4372116014 (4.372s) real 0m5.867s user 0m0.042s sys 0m0.009s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 48C P2 150W / 170W | 3295MiB / 12288MiB | 93% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 46087 C /usr/bin/ollama 3010MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 1026896 KB

llama3.2:3b-instruct-q4_K_M(GPU使用)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q4_K_M 2026-02-16 total_duration(合計時間) : 6625570133 (6.626s) load_duration(モデルのロード時間) : 893207077 (0.893s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 23940970 (0.024s) eval_count(生成トークン数) : 684 eval_duration(生成時間) : 5321587263 (5.322s) real 0m6.637s user 0m0.026s sys 0m0.005s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 48C P2 169W / 170W | 3113MiB / 12288MiB | 93% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 68947 C /usr/bin/ollama 2834MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 756868 KB

llama3.2:3b-instruct-q5_K_M(GPU使用)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q5_K_M 2026-02-16 total_duration(合計時間) : 5302171898 (5.302s) load_duration(モデルのロード時間) : 1154967940 (1.155s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 25094372 (0.025s) eval_count(生成トークン数) : 448 eval_duration(生成時間) : 3813280667 (3.813s) real 0m5.313s user 0m0.016s sys 0m0.014s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 50C P2 169W / 170W | 3401MiB / 12288MiB | 92% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 69013 C /usr/bin/ollama 3122MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 750316 KB

llama3.2:3b-instruct-q8_0(GPU使用)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization Q8_0 2026-02-16 total_duration(合計時間) : 2689089493 (2.689s) load_duration(モデルのロード時間) : 1135502130 (1.136s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 24539584 (0.025s) eval_count(生成トークン数) : 124 eval_duration(生成時間) : 1445543137 (1.446s) real 0m2.699s user 0m0.019s sys 0m0.009s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 50C P2 161W / 170W | 4455MiB / 12288MiB | 96% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 69086 C /usr/bin/ollama 4170MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 832536 KB

llama3.2:3b-instruct-fp16(GPU使用)

Model architecture llama parameters 3.2B context length 131072 embedding length 3072 quantization F16 2026-02-16 total_duration(合計時間) : 11776163735 (11.776s) load_duration(モデルのロード時間) : 1402064396 ( 1.402s) prompt_eval_count(評価されたプロンプトのトークン数) : 52 prompt_eval_duration(プロンプトの評価時間) : 41251014 ( 0.041s) eval_count(生成トークン数) : 497 eval_duration(生成時間) : 10053706611 (10.054s) real 0m11.784s user 0m0.013s sys 0m0.012s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 56C P2 158W / 170W | 7395MiB / 12288MiB | 97% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB | | 0 N/A N/A 1899 G xfwm4 2MiB | | 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB | | 0 N/A N/A 69153 C /usr/bin/ollama 7116MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 1286548 KB