How OpenClaw Stays Sharp
Without Burning Tokens
A technical playbook: real failures, exact fixes, measured results. Every technique here encodes a real incident. The failure mode, the config change, the measured outcome. No theory — only what worked in production.
SECTION A — Prompt Caching
The single highest-ROI change in any OpenClaw deployment
One key, half the bill
What breaks: Every health monitor poll (default 5 min) triggers a cold cache write.
"agents": { "list": [ {
"id": "your-agent",
"params": { "cacheRetention": "long" }
} ] }Critical: Set in params, not at root level.
| Metric | After |
|---|---|
| Hit rate | 65.8% (from 39%) |
| Write tokens | −56% |
15 min, not the 5 min default
What breaks: 5-min checks equal default cache TTL — every poll is a full cold cache write. Writes dominate spend.
"gateway": {
"channelHealthCheckMinutes": 15
}Why 15 min? Keeps Telegram responsive while staying inside the 1 hr cache window.
Kill unauthenticated polls
What breaks: enabled: true channels with no auth still poll, load full context, do nothing, and burn tokens.
| State | Tokens/hr |
|---|---|
| false | 0 |
| true (unlinked) | ~200,000 |
Patch via docker exec, never scp:
docker exec <container> node -e "..."Pin Haiku in Lossless Claw
Incident: Omitting this caused Sonnet compaction sessions — producing 58k-token cache writes at 3.75× the cost of Haiku.
"plugins": {
"lossless-claw": {
"summaryModel": "claude-haiku-...",
"summaryProvider": "anthropic"
}
}Track Cache Hit Rate
reads / (reads + writes)
- Below 50%Audit Config
- 50–60%Acceptable
- Above 60%Healthy
Compute weekly via export. A rising token baseline without new task volume means cache is cold.
SECTION B — Model Routing
Never use a $15 model for a $0.15 job
| Task | Model | Input/1M |
|---|---|---|
| Classification, structured output | Mini / Flash | ~$0.15 |
| Health checks, status reads | Haiku | $0.80 |
| Content, analysis, coding | Sonnet / GPT-5 | ~$3.00 |
| Orchestration, highest-stakes | Opus / GPT-5.2 | $15.00+ |
Rule: If quality avg is 4.5+, trial one tier down for a week. If ≥3.5, keep it and log savings.
End with cheap/safe
Incident: Unverified fallback provider key caused silent 30s timeouts that looked like API errors.
"model": {
"primary": "gemini-3-flash",
"fallbacks": ["gpt-5.4-mini"]
}Always verify keys in container first.
Old sessions carry hardcoded model fields and ignore openclaw.json changes. Execute node script to clear sess.model after any upgrade.
Always set agents.defaults.maxTokens: 4096. Uncapped runaway responses can cost a full day's budget.
SECTION C — Prompt & Context Discipline
What you pass in gets re-read on every single turn
Format: What, format, constraints. Nothing else.
- Cut motivational framing.
- Cut rules already in SKILL.md.
- Pass outlines, not full docs.
- Cut unvarying examples.
Audit Monthly
At 50 sessions/month, a 500-word bloat vs 290-word file costs 210 × 50 in pure overhead before work begins.
Before: 800w → After: 290w.
Result: 63% reduction/session.
Structure:
- Mission (1 sentence)
- Hard constraints (first)
- What it can do
- How (code blocks, not prose)
- Output schemas
- Routing table & Token rules
Write tool results to workspace files immediately. Never hold 2k-token raw output in active context through subsequent turns.
Prevents mid-conversation operator instructions from falling into the lossy compaction window before execution. Set in agents.defaults.
SECTION D — Plugins: Lossless Claw + QMD
The two plugins that change what long-running agents can actually do
Solving the sliding window problem
The problem: Default sliding windows drop old messages permanently. Overnight agents forget yesterday's decisions.
What it does: Writes full text to local SQLite. Builds a DAG summary (nodes = concepts, edges = relationships). Uses lcm_grep for verbatim recall.
Security Note:
Treat SQLite DB as sensitive. chmod 600 database, chmod 700 directory. Review source code before prod deployment.
| Approach | Recall Capability |
|---|---|
| Default Window | Lost permanently |
| Flat Summary | Lossy, nuance gone |
| DAG + SQLite | Exact verbatim recall |
25:1
Compression Ratio
Structured Durable Memory
Query Memory Database backend prevents unbounded MEMORY.md growth. Agents retrieve only what they need instead of loading full files.
"memory": {
"backend": "qmd",
"workspacePaths": [
".../workspace-ops"
]
}Patch safely; do not overwrite openclaw.json.
SECTION E — Spawn & Session Discipline
Every spawn is a budget line item with a hard timeout
Context grows linearly
By task 8, the growing context makes subsequent tasks more expensive than the first. Errors in early tasks compound.
Parallel spawns for the same agent cause race conditions. Keep batches sequential and small.
Avoid 600s slot holding
What breaks: 4 agents × 600s = 40 min timeout blocking scheduler.
- 180s for focused APIs/writes.
- 300s for external fetches/chains.
- If >3 min needed, split the task.
Set inside agents.defaults.subagents.
every derives unconfigurable timeouts (e.g. 10 min = 10s). Use cron for LLM tasks.
Set maxConcurrentSessions: 2 on 1.5 GB VPS to stop OOM kills.
Pre-fetch 15–60s API calls to workspace before spawn. Session budget is for reasoning only.
SECTION F — Cron Isolation
Scheduled jobs bleed into operator context
Cron jobs in main context deposit stale API data, inflating input tokens forever. Add --sessionTarget isolated and --lightContext true (speeds first LLM call 20–30%).
Don't return full conversation histories. One JSON line replaces 30+ lines of narration per cycle.
{"agent":"writer","tasks":3,"quality":4.1,"files":["/draft.md"]}SECTION G — Memory & Quality
Fabrication costs more than the tokens it saved
Don't rely on volunteering. Use a dedicated nightly cron with explicit categories (decisions, blockers). Warn agent of permanent loss in prompt.
All sub-workspaces need strategy file copies. Add rule: "Update and push to all workspaces when strategy changes."
Fix is procedural. Rule: "After write/delete, call GET. If tool output says FAIL, report FAILED."
Floor breached 2 weeks → escalate. Budget enforced by VPS cron (80% = downgrade model, 95% = pause jobs), not agent prompts.
SECTION H — Config Safety
A bad config patch costs more than a week of operations
27. No SCP Patches
Template files wipe real tokens and channels. Use targeted node scripts via docker exec. Read → patch one key → write.
28. Models Allowlist
agents.defaults.models empty array removes all restrictions. Explicitly list allowed models or risk expensive API calls.
29. agentToAgent
enabled: true isn't enough. Must explicitly add sessions_spawn, sessions_send to tools.allow or it falls back to shell commands.
30. Workspace Tools
Tools in main workspace don't auto-share. Sub-agents will hallucinate broken tool code if you don't copy scripts to their tools/ directory.
Pre-Spawn Checklist
- Prompt ≤ 20 lines? (What / format / constraints)
- Passing only needed text, not full files?
- Right model tier for the task?
- Batch size ≤ 5 tasks?
- Side-effects batched?
- Format specific (1 JSON line)?
- One agent at a time?
- Using cron, not `every`?
- Timeout set 180s (not 600)?
- Pre-fetched slow data?
- sessions_spawn (not send) for idle agents?
By the Numbers
| Optimization | Measured Result |
|---|---|
| params.cacheRetention: "long" | Cache hit 39% → 65.8%, write tokens −56% |
| channelHealthCheckMinutes: 15 | −66% cold cache write cycles during idle |
| Disabling unlinked channel | −200,000 tokens/hour |
| SKILL.md audit (800w → 290w) | 63% input reduction per session |
| Tight prompts (≤5 lines) | 30–60% input reduction per spawn |
| Batch strategy (max 5, not 10) | 40% fewer total tokens |
| freshTailCount: 48 vs 32 | 50% more verbatim context preserved |
| Lossless Claw DAG compression | 25:1 context ratio — no information loss |
| Pinning summaryModel to Haiku | Prevents 3.75× overcharge on compaction |
| Compact handoffs (1 JSON line) | Eliminates multi-100-line history re-reads |
| Pre-fetching slow API data | Session timeout budget goes 100% to model |