这个咋样?
(base) PS E:\kaiwu-windows-amd64> .\kaiwu.exe run Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf
██╗ ██╗ █████╗ ██╗██╗ ██╗██╗ ██╗
██║ ██╔╝██╔══██╗██║██║ ██║██║ ██║
█████╔╝ ███████║██║██║ █╗ ██║██║ ██║
██╔═██╗ ██╔══██║██║██║███╗██║██║ ██║
██║ ██╗██║ ██║██║╚███╔███╔╝╚██████╔╝
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚══╝╚══╝ ╚═════╝
本地大模型部署器 vv0.1.1 · llama.cpp b8864
by
llmbbs.ai · 本地 AI 技术社区
[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 4060 Laptop GPU (SM89, 8188 MB VRAM, 0 GB/s)
RAM: 63 GB DDR5
OS: windows amd64
[2/6] Selecting configuration...
Model: Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive (moe, 36B total / 1B active)
Quant: Q4_K_M (19.7 GB)
Mode: moe_offload (experts on CPU)
Accel: Flash Attention
[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf [cached]
[4/6] Preflight check...
✓ VRAM sufficient
[5/6] Warmup benchmark...
Probe 1: ctx=256K ... 22.1 tok/s
Tune ubatch: ub=128 → 22.3 tok/s; ub=512 → 20.7 tok/s;
✓ 22.3 tok/s @ 256K ctx
Saved profile: C:\Users\pzz\.kaiwu\profiles\qwen3.6-35b-a3b-uncensored-hauhaucs-aggressive-q4_k_m_sm89_8188mb_ddr5.json
✓ 22.3 tok/s
[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
llama-server started (PID 49380, port 11434)
Kaiwu proxy started (port 11435)
2026/04/24 22:03:23 Kaiwu proxy listening on :11435 → llama-server :11434
┌─────────────────────────────────────────────────┐
│ Ready — Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive @ 22.3 tok/s │
│ API: http://127.0.0.1:11435/v1/chat/completions │
│ 模型文件夹: E:\model │
└─────────────────────────────────────────────────┘
运行 kaiwu inject 接入 IDE · Ctrl+C 停止
─ 实时监控 · 空载 ─────────────────── 每 2s 刷新 ─
reuse:1024 · KV:q8_0 · 256K ctx · ub128 · mlock
速度 显存 内存 GPU 温度
— tok/s 5.5/8 GB 47.0/64 GB 2% 58°CC
[..........] [======....] [=======...] [..........] [=====.....]
─────────────────────────────────────────────────────────
上下文 [....................] 0.0K / 256K 余 256.0K
正在停止服务...
✓ llama-server 已停止
✓ Kaiwu proxy 已停止