rune15's recent timeline updates

rune15

V2EX member #85383, joined on 2014-12-05 21:05:44 +08:00

浙江省台州市

rune15 提问技术话题好玩工作信息交易信息城市相关

rune15's recent replies

2 days ago

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 收到，我再试试

3 days ago

Replied to a topic by KaiWuBOSS › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

D:\AI\models>kaiwu run Qwen3-30B-A3B

██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.1 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 4070 Ti (SM89, 12282 MB VRAM, 0 GB/s)
RAM: 31 GB DDR4
OS: windows amd64

[2/6] Selecting configuration...
Model: Qwen3-30B-A3B (moe, 30B total / 3B active)
Quant: ud-q3-k-xl (14.0 GB)
Mode: moe_offload (experts on CPU)
Accel: Flash Attention + MTP (native)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Downloading model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf
From: https://hf-mirror.com/unsloth/Qwen3-30B-A3B-GGUF/resolve/main/Qwen3-30B-A3B-UD-Q3_K_XL.gguf
Downloading 100% |████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (14/14 GB, 25 MB/s) [9m10s:0s]
Model: Qwen3-30B-A3B-UD-Q3_K_XL.gguf [cached]

[4/6] Preflight check...
✓ VRAM sufficient

[5/6] Warmup benchmark...
Probe 1: ctx=128K ... OOM
Probe 2: ctx=64K ... OOM
Probe 3: ctx=32K ... OOM
Probe 4: ctx=16K ... OOM
Probe 5: ctx=8K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
llama-server 不支持 iso3 ，回退到 q8_0/q4_0
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行
建议：选择更小的量化或使用 MoE offload 模型
Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--reset 清除缓存，重新 warmup 探测最优参数

我的 4070-Ti 也同样加载不了

11 days ago

Replied to a topic by mode171 › 问与答 › 有人说 [外包] 太难听了，很土，有没有什么别的、更好听的名称。

供应商

Mar 24

Replied to a topic by gitsuck › 电动汽车 › 极氪 7X / Model Y / MG4 怎么选？

@b1iy 我是 7x 车主，感觉售后并不差。

Feb 24

Replied to a topic by 6581 › 生活 › 2026 年，大伙有啥年度计划呢

换个轻松点的工作

Feb 24

Replied to a topic by jonty › 职场话题 › 开工第一天提了离职

别犹豫，开弓没有回头箭