Last Update 2026/02/17
低スペック寄りのPCでローカルLLMを動作させた際の記録です。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。
検証用PC
|
OS |
Debian GNU/Linux 12 (bookworm) |
|
CPU |
Intel(R) Core(TM) i5-14400F |
|
GPU |
GeForce RTX 3060 12GB |
|
メモリ |
DDR4 PC4-25600 32GB × 4 |
|
SSD |
crucial P310 CT1000P310SSD8-JP |
構築環境 : Docker + Ollama (特別な設定などは無い状態)
検証用プロンプト
おすすめの日本の絶景を教えてください。東西南北、10箇所程度、日本語で。
Llama 3.2 [日本語プロンプト]
GPU無し
1b-instruct-q4_K_M(45.9TPS)
1b-instruct-q5_K_M(41.4TPS)
1b-instruct-q8_0(31.1TPS)
1b-instruct-fp16(16.8TPS)
3b-instruct-q4_K_M(18.4TPS) 3b-instruct-q5_K_M(16.2TPS) 3b-instruct-q8_0(12.0TPS) 3b-instruct-fp16(6.56TPS)
GPU使用
3b-instruct-q4_K_M(18.4TPS) 3b-instruct-q5_K_M(16.2TPS) 3b-instruct-q8_0(12.0TPS) 3b-instruct-fp16(6.56TPS)
1b-instruct-q4_K_M(304TPS)
1b-instruct-q5_K_M(284TPS)
1b-instruct-q8_0(216TPS)
1b-instruct-fp16(124TPS)
3b-instruct-q4_K_M(129TPS) 3b-instruct-q5_K_M(117TPS) 3b-instruct-q8_0(85.8TPS) 3b-instruct-fp16(49.4TPS)
3b-instruct-q4_K_M(129TPS) 3b-instruct-q5_K_M(117TPS) 3b-instruct-q8_0(85.8TPS) 3b-instruct-fp16(49.4TPS)
・TPS(tokens/s) は eval_count / eval_duration により算出
・モデルロード済みの検証は省略
llama3.2:1b-instruct-q4_K_M(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 11501001926 (11.501s)
load_duration(モデルのロード時間) : 812907296 ( 0.813s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 263865232 ( 0.264s)
eval_count(生成トークン数) : 466
eval_duration(生成時間) : 10149972037 (10.150s)
real 0m11.511s
user 0m0.015s
sys 0m0.014s
メモリ使用量(RSS) : 1036208 KB
llama3.2:1b-instruct-q5_K_M(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 10939658117 (10.940s)
load_duration(モデルのロード時間) : 793890321 ( 0.794s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 338222976 ( 0.338s)
eval_count(生成トークン数) : 396
eval_duration(生成時間) : 9567252255 ( 9.567s)
real 0m10.950s
user 0m0.024s
sys 0m0.006s
メモリ使用量(RSS) : 1142536 KB
llama3.2:1b-instruct-q8_0(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q8_0
2026-02-16
total_duration(合計時間) : 7657952525 (7.658s)
load_duration(モデルのロード時間) : 793598110 (0.794s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 206035917 (0.206s)
eval_count(生成トークン数) : 203
eval_duration(生成時間) : 6533717406 (6.534s)
real 0m7.668s
user 0m0.024s
sys 0m0.005s
メモリ使用量(RSS) : 1535096 KB
llama3.2:1b-instruct-fp16(GPU無し)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization F16
2026-02-16
total_duration(合計時間) : 33328163716 (33.328s)
load_duration(モデルのロード時間) : 1063475193 ( 1.063s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 318299358 ( 0.318s)
eval_count(生成トークン数) : 530
eval_duration(生成時間) : 31589739060 (31.590s)
real 0m33.339s
user 0m0.022s
sys 0m0.011s
メモリ使用量(RSS) : 2665388 KB
llama3.2:3b-instruct-q4_K_M(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 41005752242 (41.006s)
load_duration(モデルのロード時間) : 1074311413 ( 1.074s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 714585092 ( 0.715s)
eval_count(生成トークン数) : 712
eval_duration(生成時間) : 38783315803 (38.783s)
real 0m41.012s
user 0m0.027s
sys 0m0.000s
メモリ使用量(RSS) : 2543904 KB
llama3.2:3b-instruct-q5_K_M(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 43281281383 (43.281s)
load_duration(モデルのロード時間) : 1048326137 ( 1.048s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 956229819 ( 0.956s)
eval_count(生成トークン数) : 662
eval_duration(生成時間) : 40881022622 (40.881s)
real 0m43.292s
user 0m0.020s
sys 0m0.014s
メモリ使用量(RSS) : 2838348 KB
llama3.2:3b-instruct-q8_0(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q8_0
2026-02-16
total_duration(合計時間) : 33119246141 (33.119s)
load_duration(モデルのロード時間) : 1292674681 ( 1.293s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 520029242 ( 0.520s)
eval_count(生成トークン数) : 373
eval_duration(生成時間) : 31083260384 (31.083s)
real 0m33.130s
user 0m0.023s
sys 0m0.010s
メモリ使用量(RSS) : 3910840 KB
llama3.2:3b-instruct-fp16(GPU無し)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization F16
2026-02-16
total_duration(合計時間) : 68190815829 (68.191s)
load_duration(モデルのロード時間) : 1544711966 ( 1.545s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 892044917 ( 0.892s)
eval_count(生成トークン数) : 430
eval_duration(生成時間) : 65499135898 (65.499s)
real 1m8.202s
user 0m0.022s
sys 0m0.016s
メモリ使用量(RSS) : 6853872 KB
llama3.2:1b-instruct-q4_K_M(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 3105868158 (3.106s)
load_duration(モデルのロード時間) : 915587971 (0.156s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 13159742 (0.013s)
eval_count(生成トークン数) : 565
eval_duration(生成時間) : 1856111989 (1.856s)
real 0m3.117s
user 0m0.020s
sys 0m0.010s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 43C P2 169W / 170W | 1611MiB / 12288MiB | 84% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 45842 C /usr/bin/ollama 1326MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 640244 KB
llama3.2:1b-instruct-q5_K_M(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 3031084870 (3.031s)
load_duration(モデルのロード時間) : 894848615 (0.895s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 13127833 (0.013s)
eval_count(生成トークン数) : 517
eval_duration(生成時間) : 1819582094 (1.820s)
real 0m3.042s
user 0m0.025s
sys 0m0.005s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 46C P2 169W / 170W | 1709MiB / 12288MiB | 85% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 45908 C /usr/bin/ollama 1424MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 641380 KB
llama3.2:1b-instruct-q8_0(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization Q8_0
2026-02-16
total_duration(合計時間) : 3409915401 (3.410s)
load_duration(モデルのロード時間) : 885666273 (0.886s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 11576154 (0.012s)
eval_count(生成トークン数) : 485
eval_duration(生成時間) : 2244038033 (2.244s)
real 0m3.421s
user 0m0.026s
sys 0m0.009s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 45C P2 152W / 170W | 2101MiB / 12288MiB | 90% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 45977 C /usr/bin/ollama 1816MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 694660 KB
llama3.2:1b-instruct-fp16(GPU使用)
Model
architecture llama
parameters 1.2B
context length 131072
embedding length 2048
quantization F16
2026-02-16
total_duration(合計時間) : 5847412488 (5.847s)
load_duration(モデルのロード時間) : 1144955109 (1.145s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 25784321 (0.026s)
eval_count(生成トークン数) : 541
eval_duration(生成時間) : 4372116014 (4.372s)
real 0m5.867s
user 0m0.042s
sys 0m0.009s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 48C P2 150W / 170W | 3295MiB / 12288MiB | 93% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 46087 C /usr/bin/ollama 3010MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 1026896 KB
llama3.2:3b-instruct-q4_K_M(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q4_K_M
2026-02-16
total_duration(合計時間) : 6625570133 (6.626s)
load_duration(モデルのロード時間) : 893207077 (0.893s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 23940970 (0.024s)
eval_count(生成トークン数) : 684
eval_duration(生成時間) : 5321587263 (5.322s)
real 0m6.637s
user 0m0.026s
sys 0m0.005s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 48C P2 169W / 170W | 3113MiB / 12288MiB | 93% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 68947 C /usr/bin/ollama 2834MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 756868 KB
llama3.2:3b-instruct-q5_K_M(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q5_K_M
2026-02-16
total_duration(合計時間) : 5302171898 (5.302s)
load_duration(モデルのロード時間) : 1154967940 (1.155s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 25094372 (0.025s)
eval_count(生成トークン数) : 448
eval_duration(生成時間) : 3813280667 (3.813s)
real 0m5.313s
user 0m0.016s
sys 0m0.014s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 50C P2 169W / 170W | 3401MiB / 12288MiB | 92% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 69013 C /usr/bin/ollama 3122MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 750316 KB
llama3.2:3b-instruct-q8_0(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization Q8_0
2026-02-16
total_duration(合計時間) : 2689089493 (2.689s)
load_duration(モデルのロード時間) : 1135502130 (1.136s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 24539584 (0.025s)
eval_count(生成トークン数) : 124
eval_duration(生成時間) : 1445543137 (1.446s)
real 0m2.699s
user 0m0.019s
sys 0m0.009s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 50C P2 161W / 170W | 4455MiB / 12288MiB | 96% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 125MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 69086 C /usr/bin/ollama 4170MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 832536 KB
llama3.2:3b-instruct-fp16(GPU使用)
Model
architecture llama
parameters 3.2B
context length 131072
embedding length 3072
quantization F16
2026-02-16
total_duration(合計時間) : 11776163735 (11.776s)
load_duration(モデルのロード時間) : 1402064396 ( 1.402s)
prompt_eval_count(評価されたプロンプトのトークン数) : 52
prompt_eval_duration(プロンプトの評価時間) : 41251014 ( 0.041s)
eval_count(生成トークン数) : 497
eval_duration(生成時間) : 10053706611 (10.054s)
real 0m11.784s
user 0m0.013s
sys 0m0.012s
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A |
| 0% 56C P2 158W / 170W | 7395MiB / 12288MiB | 97% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1242 G /usr/lib/xorg/Xorg 119MiB |
| 0 N/A N/A 1899 G xfwm4 2MiB |
| 0 N/A N/A 2423 G /usr/bin/x-www-browser 144MiB |
| 0 N/A N/A 69153 C /usr/bin/ollama 7116MiB |
+---------------------------------------------------------------------------------------+
メモリ使用量(RSS) : 1286548 KB