Last Update 2026/02/17
低スペック寄りのPCでローカルLLMを動作させた際の記録です。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。
検証用PC
|
OS |
Debian GNU/Linux 12 (bookworm) |
|
CPU |
Intel(R) Core(TM) i5-14400F |
|
GPU |
GeForce RTX 3060 12GB |
|
メモリ |
DDR4 PC4-25600 32GB × 4 |
|
SSD |
crucial P310 CT1000P310SSD8-JP |
構築環境 : Docker + Ollama (特別な設定などは無い状態)
検証用プロンプト
Could you please recommend some great places in the US to see beautiful scenery? Around 10 places in all four directions.
Llama 3.2 [英語プロンプト]
GPU無し
1b-instruct-q4_K_M(45.8TPS)
1b-instruct-q5_K_M(41.3TPS)
1b-instruct-q8_0(30.3TPS)
1b-instruct-fp16(16.9TPS)
3b-instruct-q4_K_M(18.8TPS) 3b-instruct-q5_K_M(16.5TPS) 3b-instruct-q8_0(11.9TPS) 3b-instruct-fp16(6.54TPS)
GPU使用
3b-instruct-q4_K_M(18.8TPS) 3b-instruct-q5_K_M(16.5TPS) 3b-instruct-q8_0(11.9TPS) 3b-instruct-fp16(6.54TPS)
1b-instruct-q4_K_M(303TPS)
1b-instruct-q5_K_M(278TPS)
1b-instruct-q8_0(210TPS)
1b-instruct-fp16(124TPS)
3b-instruct-q4_K_M(129TPS) 3b-instruct-q5_K_M(117TPS) 3b-instruct-q8_0(86.9TPS) 3b-instruct-fp16(49.5TPS)
3b-instruct-q4_K_M(129TPS) 3b-instruct-q5_K_M(117TPS) 3b-instruct-q8_0(86.9TPS) 3b-instruct-fp16(49.5TPS)
・TPS(tokens/s) は eval_count / eval_duration により算出
・モデルロード済みの検証は省略
llama3.2:1b-instruct-q4_K_M(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 11160466685 (11.160s)
load_duration(モデルのロード時間) : 847985303 ( 0.848s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 304720629 ( 0.848s)
eval_count(生成トークン数) : 446
eval_duration(生成時間) : 9743731034 ( 9.744s)
real 0m11.169s
user 0m0.027s
sys 0m0.004s
メモリ使用量(RSS) : 1034732 KB
llama3.2:1b-instruct-q5_K_M(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 10392110807 (10.392s)
load_duration(モデルのロード時間) : 793192947 ( 0.793s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 315504314 ( 0.316s)
eval_count(生成トークン数) : 373
eval_duration(生成時間) : 9024002528 ( 9.024s)
real 0m10.403s
user 0m0.025s
sys 0m0.006s
メモリ使用量(RSS) : 1143640 KB
llama3.2:1b-instruct-q8_0(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q8_0
2026-02-16
total_duration(合計時間) : 14223443411 (14.223s)
load_duration(モデルのロード時間) : 772590835 ( 0.773s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 189872136 ( 0.190s)
eval_count(生成トークン数) : 394
eval_duration(生成時間) : 12997048967 (12.997s)
real 0m14.234s
user 0m0.025s
sys 0m0.006s
メモリ使用量(RSS) : 1533476 KB
llama3.2:1b-instruct-fp16(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization F16
2026-02-16
total_duration(合計時間) : 22077110671 (22.077s)
load_duration(モデルのロード時間) : 1048190912 ( 1.048s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 303637508 ( 0.304s)
eval_count(生成トークン数) : 346
eval_duration(生成時間) : 20522973918 (20.523s)
real 0m22.088s
user 0m0.020s
sys 0m0.012s
メモリ使用量(RSS) : 2660904 KB
llama3.2:3b-instruct-q4_K_M(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 25071609188 (25.072s)
load_duration(モデルのロード時間) : 1070302959 ( 1.070s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 669983267 ( 0.670s)
eval_count(生成トークン数) : 432
eval_duration(生成時間) : 23030544016 (23.031s)
real 0m25.082s
user 0m0.024s
sys 0m0.009s
メモリ使用量(RSS) : 2544000 KB
llama3.2:3b-instruct-q5_K_M(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 30083680188 (30.084s)
load_duration(モデルのロード時間) : 1068052723 ( 1.068s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 954498736 ( 0.954s)
eval_count(生成トークン数) : 458
eval_duration(生成時間) : 27742392216 (27.742s)
real 0m30.094s
user 0m0.021s
sys 0m0.011s
メモリ使用量(RSS) : 2840260 KB
llama3.2:3b-instruct-q8_0(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q8_0
2026-02-16
total_duration(合計時間) : 44373719247 (44.374s)
load_duration(モデルのロード時間) : 1298096668 ( 1.298s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 528690690 ( 0.529s)
eval_count(生成トークン数) : 501
eval_duration(生成時間) : 42198202232 (42.198s)
real 0m44.385s
user 0m0.025s
sys 0m0.015s
メモリ使用量(RSS) : 3918908 KB
llama3.2:3b-instruct-fp16(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization F16
2026-02-16
total_duration(合計時間) : 71134149187 (71.134s)
load_duration(モデルのロード時間) : 1528063840 ( 1.528s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 841500698 ( 0.842s)
eval_count(生成トークン数) : 448
eval_duration(生成時間) : 68450339042 (68.450s)
real 1m11.145s
user 0m0.020s
sys 0m0.018s
メモリ使用量(RSS) : 6855184 KB
llama3.2:1b-instruct-q4_K_M(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 3061415092 (3.061s)
load_duration(モデルのロード時間) : 1140996439 (1.141s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 12823968 (0.013s)
eval_count(生成トークン数) : 481
eval_duration(生成時間) : 1587401058 (1.587s)
real 0m3.073s
user 0m0.032s
sys 0m0.001s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 38C P2 165W / 170W | 1605MiB / 12288MiB | 83% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 36875 C /usr/bin/ollama 1326MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 639336 KB
llama3.2:1b-instruct-q5_K_M(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 3502409051 (3.502s)
load_duration(モデルのロード時間) : 887488583 (0.887s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 13636906 (0.014s)
eval_count(生成トークン数) : 608
eval_duration(生成時間) : 2183891028 (2.184s)
real 0m3.513s
user 0m0.030s
sys 0m0.000s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 41C P2 165W / 170W | 1703MiB / 12288MiB | 85% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 36946 C /usr/bin/ollama 1424MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 641900 KB
llama3.2:1b-instruct-q8_0(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q8_0
2026-02-16
total_duration(合計時間) : 3267031012 (3.267s)
load_duration(モデルのロード時間) : 873931484 (0.874s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 12435959 (0.012s)
eval_count(生成トークン数) : 440
eval_duration(生成時間) : 2093887745 (2.094s)
real 0m3.278s
user 0m0.025s
sys 0m0.005s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 43C P2 150W / 170W | 2095MiB / 12288MiB | 89% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 37019 C /usr/bin/ollama 1816MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 693328 KB
llama3.2:1b-instruct-fp16(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization F16
2026-02-16
total_duration(合計時間) : 7787854514 (7.788s)
load_duration(モデルのロード時間) : 1155829205 (1.156s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 53479837 (0.053s)
eval_count(生成トークン数) : 754
eval_duration(生成時間) : 6086524249 (6.087s)
real 0m7.799s
user 0m0.023s
sys 0m0.007s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 46C P2 148W / 170W | 3289MiB / 12288MiB | 92% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 37090 C /usr/bin/ollama 3010MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 1028096 KB
llama3.2:3b-instruct-q4_K_M(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 4505531008 (4.506s)
load_duration(モデルのロード時間) : 1143182015 (1.143s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 22459426 (0.022s)
eval_count(生成トークン数) : 398
eval_duration(生成時間) : 3074794246 (3.075s)
real 0m4.517s
user 0m0.020s
sys 0m0.011s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 43C P2 169W / 170W | 3113MiB / 12288MiB | 91% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 57066 C /usr/bin/ollama 2834MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 752148 KB
llama3.2:3b-instruct-q5_K_M(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 4820130999 (4.820s)
load_duration(モデルのロード時間) : 1132322462 (1.132s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 24964855 (0.025s)
eval_count(生成トークン数) : 396
eval_duration(生成時間) : 3389919420 (3.390s)
real 0m4.825s
user 0m0.015s
sys 0m0.004s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 46C P2 168W / 170W | 3401MiB / 12288MiB | 92% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 57137 C /usr/bin/ollama 3122MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 748272 KB
llama3.2:3b-instruct-q8_0(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q8_0
2026-02-16
total_duration(合計時間) : 9131036373 (9.131s)
load_duration(モデルのロード時間) : 1138349756 (1.138s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 25013479 (0.025s)
eval_count(生成トークン数) : 655
eval_duration(生成時間) : 7539457693 (7.539s)
real 0m9.141s
user 0m0.020s
sys 0m0.009s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 49C P2 165W / 170W | 4455MiB / 12288MiB | 95% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 57204 C /usr/bin/ollama 4170MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 839420 KB
llama3.2:3b-instruct-fp16(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization F16
2026-02-16
total_duration(合計時間) : 9865138571 (9.865s)
load_duration(モデルのロード時間) : 1399273409 (1.399s)
prompt_eval_count(評価されたプロンプトのトークン数) : 49
prompt_eval_duration(プロンプトの評価時間) : 36981312 (0.037s)
eval_count(生成トークン数) : 404
eval_duration(生成時間) : 8160622041 (8.161s)
real 0m9.876s
user 0m0.029s
sys 0m0.004s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 53C P2 156W / 170W | 7395MiB / 12288MiB | 96% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 57274 C /usr/bin/ollama 7116MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 1294568 KB