OpenClaw Model Routing: Cheap Models for Simple Tasks, Smart Models When Needed

📚 More on this topic: OpenClaw Token Optimization · Best Models for OpenClaw · Tiered Model Strategy · OpenClaw 100% Local · OpenClaw Setup Guide

OpenClaw’s default config points every request at one model. Every heartbeat, every file rename, every “is this JSON valid?” goes to the same place. If that place is Claude Opus, you’re paying $15 per million input tokens for work that a free local model handles identically.

One user loaded $25 onto Anthropic and watched it drain to $5 in a day with the agent doing nothing. Heartbeats were pinging Opus every 30 minutes, loading full context each time. That’s roughly $2-5/day in idle costs before you ask the agent to do a single useful thing.

Model routing fixes this. You tell OpenClaw which model handles which type of work, and the agent routes accordingly. The hard tasks go to Opus. Everything else goes to something cheaper. Your agent doesn’t get dumber. Your bill gets smaller.

The One-Model Problem

Here’s what running everything through Opus looks like:

Activity	Frequency	Tokens Per Call	Daily Cost (Opus)
Heartbeats	48/day (every 30 min)	50-100K each	$3-5
File operations	10-20/day	5-10K each	$0.50-1
Simple lookups	5-10/day	2-5K each	$0.25-0.50
Actual reasoning tasks	3-5/day	10-50K each	$1-3
Daily total			$5-10

That’s $150-300/month. And 60-80% of it is overhead that never needed a frontier model.

The math is harsh. A heartbeat loads your context files, checks task state, and reports “system OK.” That’s it. It doesn’t need the model that solved the restaurant phone call problem. It needs the model equivalent of checking your pulse.

How OpenClaw’s Model Routing Works

OpenClaw’s config supports multiple models at the provider level and routes different types of work to different models. The routing happens through three mechanisms:

1. Primary + Fallback Chain

Every agent has a primary model and an ordered list of fallbacks. If the primary fails (rate limit, outage, auth error), OpenClaw walks down the fallback chain until something works.

Request arrives → Try primary model
    → Success: Done
    → Failure: Try fallback[0]
        → Failure: Try fallback[1]
            → Failure: Try fallback[2]
                → All failed: Report error

Cooldown between retries follows a backoff pattern: 1 minute, then 5 minutes, then 25 minutes, capping at 1 hour. This prevents hammering a rate-limited API.

2. Heartbeat Routing

Heartbeats get their own model field, separate from the primary. This is the biggest routing decision you can make. Pointing heartbeats at a free local model eliminates idle costs.

3. Sub-Agent Routing

When your agent spawns sub-agents for parallel tasks, those sub-agents can use a different model than the parent. Research scouts get Haiku, the final report writer gets Sonnet, and the file organizer gets Ollama.

Which Tasks Need Which Model

Not all agent work is equal. Here’s a concrete breakdown:

Free Tier: Local (Ollama)

Task	Why Local Works	Tokens Saved vs Opus
Heartbeats (48/day)	Status check, no reasoning	2-5M tokens/day
File moves and renames	Shell commands, no intelligence needed	50-100K/day
CSV compilation	Data formatting, mechanical work	100-500K/day
Log parsing	Pattern matching, structured input	50-200K/day
Data cleanup	Removing duplicates, fixing headers	100-300K/day

Cost: $0 (electricity only)

Cheap Tier: Haiku ($1/M input, $5/M output)

Task	Why Haiku Works	When to Escalate
Web research	Reading and extracting, not reasoning	If synthesis across 10+ sources needed
Email triage	Classification task	If crafting a nuanced reply
Data extraction	Structured parsing	If ambiguous data formats
Basic Q&A	Factual recall	If answer requires inference
Simple code tasks	Pattern application	If debugging a subtle bug

Cost: ~$0.006 per typical request

Capable Tier: Sonnet ($3/M input, $15/M output)

Task	Why Sonnet	When to Escalate
Cold outreach writing	Needs personality and persuasion	If writing for C-suite audiences
Code generation	Complex logic, multiple files	If architecting a new system
Analysis and reports	Multi-source synthesis	If the analysis drives a major decision
Debugging	Multi-step reasoning	If the bug is in security-critical code
Skill creation	Building new agent capabilities	If the skill is complex or safety-relevant

Cost: ~$0.03-0.05 per typical request

Frontier Tier: Opus ($15/M input, $75/M output)

Task	Why Only Opus
Architecture decisions	Wrong foundation = expensive rework
Security-sensitive operations	Missing a vulnerability costs more than the API call
Novel problem-solving	No established pattern to follow
Long multi-step chains (10+)	Cheaper models lose track of state
Critical business logic	Getting it wrong has real consequences

Cost: ~$0.20-1.00 per typical request

The rule: If fixing a mistake from a cheaper model costs more than the difference between Haiku and Opus, use Opus.

Setting Up Tiered Routing in openclaw.json

Starter Config: Two Tiers (Local + One API)

This is the minimum viable routing setup. Heartbeats go local, everything else goes to your API model of choice.

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          { "id": "llama3.1:8b" }
        ]
      },
      "anthropic": {
        "apiKey": "sk-ant-your-key-here",
        "models": [
          { "id": "claude-sonnet-4-20250514" }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-20250514"
      },
      "heartbeat": {
        "every": "30m",
        "model": "ollama/llama3.1:8b"
      }
    }
  }
}

What this saves: All idle costs eliminated ($2-5/day), but active tasks still run at Sonnet pricing.

Full Four-Tier Configuration

This is the setup that turned a $150 overnight task into $6.

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          { "id": "llama3.1:8b" }
        ]
      },
      "anthropic": {
        "apiKey": "sk-ant-your-key-here",
        "models": [
          { "id": "claude-3-5-haiku-latest" },
          { "id": "claude-sonnet-4-20250514" },
          { "id": "claude-opus-4" }
        ]
      }
    },
    "aliases": {
      "haiku": "anthropic/claude-3-5-haiku-latest",
      "sonnet": "anthropic/claude-sonnet-4-20250514",
      "opus": "anthropic/claude-opus-4"
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-3-5-haiku-latest",
        "fallbacks": [
          "anthropic/claude-sonnet-4-20250514",
          "anthropic/claude-opus-4"
        ]
      },
      "heartbeat": {
        "every": "30m",
        "model": "ollama/llama3.1:8b"
      },
      "subagents": {
        "model": "anthropic/claude-3-5-haiku-latest",
        "maxConcurrent": 3
      }
    }
  }
}

Notice three things:

Primary is Haiku, not Sonnet. This is deliberate. Haiku handles 75% of agent work competently. Making it the default means most requests go cheap.
Fallbacks escalate upward. Haiku → Sonnet → Opus. When Haiku can’t handle something (or gets rate-limited), OpenClaw automatically moves up.
Sub-agents default to Haiku. When your agent spawns research scouts, they don’t need Opus. Haiku reads blogs and extracts data just fine.

Model Aliases

The aliases block lets you reference models by short names instead of full provider/model strings. Useful when switching models via the /model command during a session:

/model opus    ← switches to Claude Opus
/model haiku   ← switches back to Haiku

Skill-Level Routing

OpenClaw skills can spawn sub-agents with specific model overrides. A coding skill might route to Sonnet while a file-organization skill routes to Ollama:

Coding skill → spawns sub-agent with model: "anthropic/claude-sonnet-4-20250514"
Research skill → spawns sub-agent with model: "anthropic/claude-3-5-haiku-latest"
File ops skill → spawns sub-agent with model: "ollama/llama3.1:8b"

You configure this in each skill’s definition, not in the global config. The parent agent’s model handles orchestration; the sub-agent’s model handles execution.

ClawRouter: Automatic Task Classification

Setting up manual routing works, but you’re still choosing which model handles each task. ClawRouter automates that decision.

What It Does

ClawRouter is an open-source routing layer by BlockRun AI that sits between OpenClaw and your model providers. It analyzes each request using a 15-dimension weighted scoring system and routes to the cheapest model that can handle the task. The classification happens locally in under 1 millisecond with zero API calls.

The Four Routing Tiers

Tier	Task Type	Model Examples	Approx. Cost
SIMPLE	File ops, formatting, status checks	Free/ultra-cheap models	~$0.001/M tokens
MEDIUM	Research, extraction, basic coding	Mid-tier models	~$1.50/M tokens
COMPLEX	Analysis, complex coding, writing	Gemini 2.5 Pro, Sonnet	~$10/M tokens
REASONING	Architecture, debugging, novel problems	Grok, Opus	~$0.50-15/M tokens

Routing Profiles

ClawRouter ships with four profiles:

auto — Balanced routing (default). Good starting point.
eco — Maximizes savings (78-99% cost reduction). Routes aggressively to cheap models.
premium — Routes to the best model for each task type. Still cheaper than single-model.
free — Uses only free models. Falls back to gpt-oss-120b when wallet is empty.

Installation

curl -fsSL https://blockrun.ai/ClawRouter-update | bash
openclaw gateway restart

ClawRouter uses USDC micropayments on Base for pay-per-request billing. $5 in your wallet covers thousands of requests at typical usage. When the balance hits zero, it falls back to free models automatically.

Is It Worth It?

For users spending $50+/month on API costs, yes. ClawRouter’s blended average across all tiers is roughly $2/M tokens compared to $25/M for Opus. That’s a 92% reduction with no manual routing decisions.

For users already running the four-tier config above, the incremental savings are smaller. ClawRouter’s main value is convenience: it classifies tasks automatically instead of you defining routing rules.

Fallback Configuration

What Happens When a Model Fails

Three common failure modes and how OpenClaw handles each:

Failure	What OpenClaw Does	Your Config Controls
Rate limit (429)	Tries next model in fallback chain	`fallbacks` array order
Auth error	Rotates auth profiles within same provider first, then falls back	Provider auth config
Model outage	Moves to next fallback after cooldown	Cooldown is automatic (1min → 5min → 25min → 1hr cap)

Setting Up an Escalation Chain

The fallbacks array is ordered. Put cheaper models first:

"model": {
  "primary": "anthropic/claude-3-5-haiku-latest",
  "fallbacks": [
    "anthropic/claude-sonnet-4-20250514",
    "openai/gpt-4o",
    "anthropic/claude-opus-4"
  ]
}

This means: try Haiku first. If rate-limited, try Sonnet. If Sonnet is also down, try GPT-4o (different provider, different rate limits). Last resort: Opus.

Cross-provider fallbacks matter. When Anthropic hits a rate limit, OpenAI probably hasn’t. Having providers from both means your agent rarely stalls completely.

Rate Limit Pacing for New Accounts

New Anthropic accounts get roughly 30,000 tokens per minute. That’s tight. A single request with bloated context can blow through it.

Add pacing to your agent’s operating instructions:

Space API calls at least 5 seconds apart. If you receive a 429 error, wait 60 seconds before retrying. Do not fire multiple parallel requests unless explicitly instructed.

This prevents cascading retries that waste tokens and compound the rate limit problem. Once your account matures and limits increase, you can relax the pacing.

Before and After: Real Cost Comparisons

Example 1: Overnight Research Task (6 Hours, 14 Sub-Agents)

A B2B lead research job: find distressed businesses, gather contact info, write personalized outreach.

Setup	What Happens	Cost
All Opus	Every sub-agent runs Opus. 14 agents × 6 hours × full context loading	~$150-200
All Sonnet	Better, but still expensive for research scouts doing simple lookups	~$40-60
Tiered routing	Haiku scouts, Sonnet writer, Ollama file organizer	$6

The tiered version used Haiku for the bulk of the work (reading blogs, finding LinkedIn profiles, gathering data). Sonnet stepped in only for crafting the outreach emails. Ollama handled file compilation. 95% of tokens were served from Anthropic’s prompt cache, which cut costs further.

Example 2: Daily Personal Assistant

Checking messages, organizing files, answering questions, light scheduling.

Setup	Daily Cost	Monthly Cost
All Opus (default)	$5-10	$150-300
All Sonnet	$2-4	$60-120
Tiered (Haiku primary, Ollama heartbeats)	$0.50-1	$15-30

Example 3: Coding Workflow

Agent assists with code reviews, debugging, feature implementation.

Setup	Daily Cost	Monthly Cost
All Opus	$15-30	$450-900
All Sonnet	$5-10	$150-300
Tiered (Haiku + Sonnet for code, Opus for architecture)	$3-5	$90-150

The Monthly Breakdown

Strategy	Idle Cost	Light Use	Active Use	Heavy Use
Single model (Opus)	$60-150	$150-300	$450-900	$1,500+
Single model (Sonnet)	$30-60	$60-120	$150-300	$500-900
Tiered routing	$0	$15-30	$90-150	$300-500
100% local (Ollama only)	$0	~$2 (electricity)	~$5	~$10

Tiered routing doesn’t match the $0/month of running fully local, but you keep the capability of frontier models when you actually need them. For most users, that tradeoff makes sense.

Common Routing Mistakes

Mistake 1: Routing Everything Cheap

Making Ollama or Haiku handle tasks they can’t. Your agent tries three times, fails, retries with more context, fails again. You’ve now spent more tokens on retries than Sonnet would have cost in one pass.

Fix: Be honest about what cheap models can do. Simple reasoning and data extraction, yes. Complex debugging and novel problem-solving, no.

Mistake 2: No Fallback Chain

Running a single primary model with no fallbacks. When Anthropic rate-limits you at 2 AM during an overnight task, the agent stalls until the cooldown expires.

Fix: Always have at least one cross-provider fallback. If your primary is Anthropic, add an OpenAI model as backup.

Mistake 3: Opus for Heartbeats

The default config doesn’t separate heartbeat routing. If you set your primary to Opus and forget to set a heartbeat model, you’re paying frontier prices 48 times a day for a status check.

Fix: First thing you configure, always: "heartbeat": { "model": "ollama/llama3.1:8b" }. See our token optimization guide for the full idle cost breakdown.

Mistake 4: Ignoring Context Bloat

Routing to Haiku saves money per token, but if you’re sending 100KB of session history with every request, the savings are smaller than they should be. Context bloat is a multiplier on whatever model you’re using.

Fix: Run a “new session” purge before major tasks. Trim context files to under 20KB total.

Getting Started: The 10-Minute Setup

If you’re currently running a single model and want to start routing:

Step 1: Install Ollama (2 minutes)

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b

Step 2: Add Ollama to your config (2 minutes)

Add the Ollama provider to the models.providers section of ~/.openclaw/openclaw.json.

Step 3: Route heartbeats to Ollama (1 minute)

Add the heartbeat block to agents.defaults. This alone saves $2-5/day.

Step 4: Switch primary to Haiku (1 minute)

Change agents.defaults.model.primary from Sonnet/Opus to Haiku. Add your current model as the first entry in fallbacks.

Step 5: Verify (4 minutes)

Watch your agent for a few hours. Check the Anthropic dashboard. Heartbeats should show zero API calls. Most tasks should run on Haiku. Fallbacks should fire only when Haiku gets stuck.

Start here. Add ClawRouter later if you want automatic classification. Add skill-level routing once you know which skills need more intelligence. The two-tier setup (Ollama heartbeats + Haiku primary) captures 80% of the savings with minimal config changes.

Bottom Line

Default OpenClaw wastes 60-80% of your API budget on work that doesn’t need frontier intelligence. Heartbeats, file operations, research lookups, data extraction. All of it runs fine on cheaper models or locally.

Three config changes fix this:

Heartbeats → Ollama. Idle cost drops from $2-5/day to $0.
Primary → Haiku. 75% of agent work runs at a fraction of Sonnet/Opus pricing.
Fallbacks → Sonnet → Opus. When Haiku can’t handle it, the agent escalates automatically.

The overnight task that cost $150 on Sonnet costs $6 with tiered routing. The daily assistant that drained $10/day drops to under $1. And if you want to skip manual configuration entirely, ClawRouter automates the routing decisions for a 78-99% cost reduction.

Start with the heartbeat fix. It takes one minute and gives you the biggest single savings. Then switch your primary to Haiku. Then add fallbacks. Check your Anthropic dashboard after each change. You’ll see the difference immediately.

OpenClaw Token Optimization — the full 97% cost reduction playbook
Best Local Models for OpenClaw — which Ollama models handle agent tasks
Running OpenClaw 100% Local — zero API costs with tradeoffs
Stop Using Frontier AI for Everything — the general case for tiered routing
OpenClaw Setup Guide — installation from scratch
OpenClaw Security Guide — lock down before connecting real accounts