How OpenClaw Stays Sharp
Without Burning Tokens

A technical playbook: real failures, exact fixes, measured results. Every technique here encodes a real incident. The failure mode, the config change, the measured outcome. No theory — only what worked in production.

SECTION A — Prompt Caching

The single highest-ROI change in any OpenClaw deployment

1. params.cacheRetention

One key, half the bill

What breaks: Every health monitor poll (default 5 min) triggers a cold cache write.

"agents": { "list": [ {
  "id": "your-agent",
  "params": { "cacheRetention": "long" }
} ] }

Critical: Set in params, not at root level.

MetricAfter
Hit rate65.8% (from 39%)
Write tokens−56%
2. channelHealthCheck

15 min, not the 5 min default

What breaks: 5-min checks equal default cache TTL — every poll is a full cold cache write. Writes dominate spend.

"gateway": {
  "channelHealthCheckMinutes": 15
}

Why 15 min? Keeps Telegram responsive while staying inside the 1 hr cache window.

Cuts cold writes by 2/3.
3. Disable Channels

Kill unauthenticated polls

What breaks: enabled: true channels with no auth still poll, load full context, do nothing, and burn tokens.

StateTokens/hr
false0
true (unlinked)~200,000

Patch via docker exec, never scp:

docker exec <container> node -e "..."
4. summaryModel

Pin Haiku in Lossless Claw

Incident: Omitting this caused Sonnet compaction sessions — producing 58k-token cache writes at 3.75× the cost of Haiku.

"plugins": {
  "lossless-claw": {
    "summaryModel": "claude-haiku-...",
    "summaryProvider": "anthropic"
  }
}
5. Diagnostics

Track Cache Hit Rate

reads / (reads + writes)

  • Below 50%Audit Config
  • 50–60%Acceptable
  • Above 60%Healthy

Compute weekly via export. A rising token baseline without new task volume means cache is cold.

SECTION B — Model Routing

Never use a $15 model for a $0.15 job

6. Match Model to Task
TaskModelInput/1M
Classification, structured outputMini / Flash~$0.15
Health checks, status readsHaiku$0.80
Content, analysis, codingSonnet / GPT-5~$3.00
Orchestration, highest-stakesOpus / GPT-5.2$15.00+

Rule: If quality avg is 4.5+, trial one tier down for a week. If ≥3.5, keep it and log savings.

7. Fallback Chains

End with cheap/safe

Incident: Unverified fallback provider key caused silent 30s timeouts that looked like API errors.

"model": {
  "primary": "gemini-3-flash",
  "fallbacks": ["gpt-5.4-mini"]
}

Always verify keys in container first.

8. Clear sessions.json

Old sessions carry hardcoded model fields and ignore openclaw.json changes. Execute node script to clear sess.model after any upgrade.

9. Set max_tokens

Always set agents.defaults.maxTokens: 4096. Uncapped runaway responses can cost a full day's budget.

SECTION C — Prompt & Context Discipline

What you pass in gets re-read on every single turn

10. Prompts ≤ 5 Lines

Format: What, format, constraints. Nothing else.

  • Cut motivational framing.
  • Cut rules already in SKILL.md.
  • Pass outlines, not full docs.
  • Cut unvarying examples.
30–60% reduction per spawn.
11. SKILL.md 300-word cap

Audit Monthly

At 50 sessions/month, a 500-word bloat vs 290-word file costs 210 × 50 in pure overhead before work begins.

Before: 800w → After: 290w.
Result: 63% reduction/session.

Structure:

  1. Mission (1 sentence)
  2. Hard constraints (first)
  3. What it can do
  4. How (code blocks, not prose)
  5. Output schemas
  6. Routing table & Token rules
12. File Writing

Write tool results to workspace files immediately. Never hold 2k-token raw output in active context through subsequent turns.

13. freshTailCount: 48

Prevents mid-conversation operator instructions from falling into the lossy compaction window before execution. Set in agents.defaults.

SECTION D — Plugins: Lossless Claw + QMD

The two plugins that change what long-running agents can actually do

14. Lossless Claw

Solving the sliding window problem

The problem: Default sliding windows drop old messages permanently. Overnight agents forget yesterday's decisions.

What it does: Writes full text to local SQLite. Builds a DAG summary (nodes = concepts, edges = relationships). Uses lcm_grep for verbatim recall.

Security Note:

Treat SQLite DB as sensitive. chmod 600 database, chmod 700 directory. Review source code before prod deployment.

ApproachRecall Capability
Default WindowLost permanently
Flat SummaryLossy, nuance gone
DAG + SQLiteExact verbatim recall

25:1

Compression Ratio

15. QMD

Structured Durable Memory

Query Memory Database backend prevents unbounded MEMORY.md growth. Agents retrieve only what they need instead of loading full files.

"memory": {
  "backend": "qmd",
  "workspacePaths": [
    ".../workspace-ops"
  ]
}

Patch safely; do not overwrite openclaw.json.

SECTION E — Spawn & Session Discipline

Every spawn is a budget line item with a hard timeout

16. Max 5 Tasks / Spawn

Context grows linearly

By task 8, the growing context makes subsequent tasks more expensive than the first. Errors in early tasks compound.

Parallel spawns for the same agent cause race conditions. Keep batches sequential and small.

Result: 40% fewer total tokens.
17. Timeout: 180s

Avoid 600s slot holding

What breaks: 4 agents × 600s = 40 min timeout blocking scheduler.

  • 180s for focused APIs/writes.
  • 300s for external fetches/chains.
  • If >3 min needed, split the task.

Set inside agents.defaults.subagents.

18. No `every` for LLMs

every derives unconfigurable timeouts (e.g. 10 min = 10s). Use cron for LLM tasks.

19. Concurrency Limits

Set maxConcurrentSessions: 2 on 1.5 GB VPS to stop OOM kills.

20. Pre-fetch Data

Pre-fetch 15–60s API calls to workspace before spawn. Session budget is for reasoning only.

SECTION F — Cron Isolation

Scheduled jobs bleed into operator context

21. sessionTarget: isolated

Cron jobs in main context deposit stale API data, inflating input tokens forever. Add --sessionTarget isolated and --lightContext true (speeds first LLM call 20–30%).

22. Compact Handoffs

Don't return full conversation histories. One JSON line replaces 30+ lines of narration per cycle.

{"agent":"writer","tasks":3,"quality":4.1,"files":["/draft.md"]}

SECTION G — Memory & Quality

Fabrication costs more than the tokens it saved

23. Force Daily Writes

Don't rely on volunteering. Use a dedicated nightly cron with explicit categories (decisions, blockers). Warn agent of permanent loss in prompt.

24. Shared Context

All sub-workspaces need strategy file copies. Add rule: "Update and push to all workspaces when strategy changes."

25. Zero-fabrication

Fix is procedural. Rule: "After write/delete, call GET. If tool output says FAIL, report FAILED."

26. Quality Floors & Budget Guards

Floor breached 2 weeks → escalate. Budget enforced by VPS cron (80% = downgrade model, 95% = pause jobs), not agent prompts.

SECTION H — Config Safety

A bad config patch costs more than a week of operations

27. No SCP Patches

Template files wipe real tokens and channels. Use targeted node scripts via docker exec. Read → patch one key → write.

28. Models Allowlist

agents.defaults.models empty array removes all restrictions. Explicitly list allowed models or risk expensive API calls.

29. agentToAgent

enabled: true isn't enough. Must explicitly add sessions_spawn, sessions_send to tools.allow or it falls back to shell commands.

30. Workspace Tools

Tools in main workspace don't auto-share. Sub-agents will hallucinate broken tool code if you don't copy scripts to their tools/ directory.

Pre-Spawn Checklist

  • Prompt ≤ 20 lines? (What / format / constraints)
  • Passing only needed text, not full files?
  • Right model tier for the task?
  • Batch size ≤ 5 tasks?
  • Side-effects batched?
  • Format specific (1 JSON line)?
  • One agent at a time?
  • Using cron, not `every`?
  • Timeout set 180s (not 600)?
  • Pre-fetched slow data?
  • sessions_spawn (not send) for idle agents?

By the Numbers

OptimizationMeasured Result
params.cacheRetention: "long"Cache hit 39% → 65.8%, write tokens −56%
channelHealthCheckMinutes: 15−66% cold cache write cycles during idle
Disabling unlinked channel−200,000 tokens/hour
SKILL.md audit (800w → 290w)63% input reduction per session
Tight prompts (≤5 lines)30–60% input reduction per spawn
Batch strategy (max 5, not 10)40% fewer total tokens
freshTailCount: 48 vs 3250% more verbatim context preserved
Lossless Claw DAG compression25:1 context ratio — no information loss
Pinning summaryModel to HaikuPrevents 3.75× overcharge on compaction
Compact handoffs (1 JSON line)Eliminates multi-100-line history re-reads
Pre-fetching slow API dataSession timeout budget goes 100% to model
BiClaw
AI Agent for Business