📚 More on this topic: Best Models for OpenClaw · OpenClaw Token Optimization · OpenClaw Setup Guide · OpenClaw Security Guide · Run Your First Local LLM

Most OpenClaw guides assume you’re running Claude or GPT-4 behind the scenes. That means API keys, monthly bills, and the nagging anxiety of watching your Anthropic balance drain while the agent runs overnight.

There’s another path. OpenClaw’s architecture doesn’t care where the intelligence comes from. It speaks the OpenAI API format, and Ollama speaks it too. Point the config at localhost, pull a capable model, and the entire system runs on your hardware. No API keys, no cloud calls, no monthly bills.

The cost is capability. A local 32B model isn’t Claude Opus. But for a surprising amount of agent work (file management, code generation, research compilation, scheduled automations), it’s enough.


What You’re Replacing (And What It Costs)

Before configuring anything, understand what running OpenClaw on cloud APIs actually costs. These numbers come from real user reports and our token optimization guide.

The Default API Bill

Usage PatternMonthly Cost (Sonnet)Monthly Cost (Opus)
Idle only (heartbeats)$60-150$150-450
Light daily use (1-2 tasks)$90-200$250-600
Active daily use (5-10 tasks)$200-400$500-1,500
Overnight batch jobs$6-150 per run$50-500 per run

Even optimized setups spend $15-30/month. A fully local setup costs $0/month in recurring fees — you pay once for the hardware and electricity.

The Real Math

One OpenClaw user loaded $25 onto Anthropic and watched it drain to $5 in a single day with the agent sitting idle. Another woke up to $500 gone overnight from a runaway task. These aren’t edge cases. OpenClaw’s default config loads your full context, session history, and memory on every API call — including the heartbeats that fire every 30 minutes.

Running locally eliminates this problem. The agent can heartbeat all day and your electricity bill goes up by pennies.


What You Need

Hardware Requirements

Local OpenClaw needs a GPU. The model running your agent determines what tasks succeed and what tasks fail, and bigger models succeed more often.

VRAMBest ModelAgent CapabilitySuccess Rate (Routine Tasks)
8GBQwen 3 8B (Q4)Basic — single-step tasks only~40-50%
12GBQwen 3 14B (Q4) or DeepSeek-R1-Distill 14BModerate — simple chains work~50-60%
16GBDeepSeek-R1-Distill 14B (Q8)Better quality on moderate tasks~60-70%
24GBQwen 3 32B (Q4_K_M)Practical — most routine tasks succeed~80-90%
48GB+Llama 3.3 70B (Q4)Strong — approaches API quality~90%+

24GB is the realistic minimum for useful local agent work. That means an RTX 3090 ($700-900 used on eBay or Amazon) or an RTX 4090. For a deeper breakdown of which models work at each tier, see our model selection guide.

Software Requirements

  • Ollama — Manages model downloads, inference, and the OpenAI-compatible API
  • OpenClaw — The agent platform itself
  • A model — Downloaded through Ollama (3-40GB depending on choice)

Step 1: Install and Configure Ollama

If you already have Ollama running, skip to Step 2.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify it's running
ollama --version

Pull Your Agent Model

Choose based on your VRAM. Pull the model before configuring OpenClaw — the download takes time and you want to verify it works first.

24GB VRAM (recommended):

ollama pull qwen3:32b

12-16GB VRAM:

ollama pull deepseek-r1:14b

8GB VRAM (limited):

ollama pull qwen3:8b

Test the Model

Run a quick sanity check:

ollama run qwen3:32b "List three files in a typical Linux home directory and explain what each contains. Be concise."

If you get a coherent response, Ollama is working. If it errors out or runs painfully slow, you probably have a VRAM issue. Check nvidia-smi to confirm GPU memory usage.

Verify the API Endpoint

OpenClaw connects through Ollama’s OpenAI-compatible API, not the chat interface:

curl http://localhost:11434/v1/models

You should see your pulled model listed. This is the endpoint OpenClaw will use.


Step 2: Configure OpenClaw for Local-Only

The Config File

Edit ~/.openclaw/openclaw.json. If you’re starting fresh, this is the entire config needed for local-only operation:

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          { "id": "qwen3:32b" }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/qwen3:32b",
        "heartbeat": "ollama/qwen3:32b"
      }
    }
  }
}

Replace qwen3:32b with whatever model you pulled in Step 1.

What Each Field Does

FieldPurpose
baseUrlPoints to Ollama’s local API. Always http://127.0.0.1:11434/v1 unless you changed Ollama’s port.
apiKeyOllama doesn’t need authentication. Use any non-empty string — "ollama-local" is convention.
apiMust be "openai-completions" for Ollama’s compatibility layer.
modelsList the model IDs exactly as they appear in ollama list.
primaryThe model that handles all agent tasks.
heartbeatThe model that handles keep-alive pings. Same model is fine for local — it’s free either way.

No API Keys Needed

That’s the point. No Anthropic key, no OpenAI key, no environment variables to set. The apiKey field exists because the OpenAI API format requires it, but Ollama ignores it.


Step 3: Optimize for Agent Workloads

Default Ollama settings are tuned for chat, not agent work. Agents need larger context windows and more predictable output.

Create a Custom Modelfile

cat << 'EOF' > ~/openclaw-agent.Modelfile
FROM qwen3:32b

PARAMETER temperature 0.7
PARAMETER num_ctx 16384
PARAMETER num_predict 4096
PARAMETER repeat_penalty 1.1
EOF

ollama create openclaw-agent -f ~/openclaw-agent.Modelfile

Then update your openclaw.json to use openclaw-agent instead of qwen3:32b.

Why These Settings

ParameterDefaultAgent SettingReason
temperature0.80.7Lower = more consistent tool calls and structured output
num_ctx2048-409616384Agent conversations accumulate fast: task descriptions, file contents, error logs
num_predict128-20484096Agents sometimes generate long code blocks or detailed plans
repeat_penalty1.11.1Prevents the model from getting stuck in loops, which matters for agents

Context Window Tradeoffs

Increasing num_ctx eats VRAM. On 24GB with Qwen 3 32B at Q4:

Context LengthVRAM UsedRemaining Headroom
4096~19GB~5GB
8192~20GB~4GB
16384~22GB~2GB
32768~25GB+OOM on 24GB

16384 is the practical maximum on 24GB. If you need longer context, drop to a smaller model or get more VRAM.


Step 4: Multi-Model Local Setup (Optional)

If you have the VRAM or patience for model swapping, you can run multiple local models for different task types. It’s the same tiered routing concept from our token optimization guide, but entirely free.

Two-Model Config

{
  "models": {
    "providers": {
      "ollama": {
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-local",
        "api": "openai-completions",
        "models": [
          { "id": "qwen3:32b" },
          { "id": "qwen3:8b" }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "ollama/qwen3:32b",
        "heartbeat": "ollama/qwen3:8b"
      }
    }
  }
}

This uses the 8B model for heartbeats and reserves 32B for real work. On 24GB VRAM, Ollama swaps models automatically. Only one is loaded at a time, so VRAM isn’t doubled.

Specialist Models

Pull different models for different strengths:

# General agent tasks
ollama pull qwen3:32b

# Code-heavy skills
ollama pull qwen2.5-coder:32b

# Complex reasoning and planning
ollama pull deepseek-r1:32b

Swap between them manually or configure OpenClaw skills to specify which model they prefer. The coding skill routes to Qwen Coder, the planning skill routes to DeepSeek-R1.


What Works and What Doesn’t

Tasks That Work Well Locally

Task TypeExampleWhy It Works
File managementOrganize downloads by type, rename batch filesSimple logic, clear instructions
Code generationWrite a Python script, create a bash automation32B models are decent coders
Data extractionParse CSVs, extract info from text filesStructured input/output
Scheduled routinesDaily backup, log rotation, report generationRepeatable, well-defined
Local researchSummarize documents, compare files, search notesNo external API needed

Tasks That Struggle Locally

Task TypeExampleWhy It Fails
Complex multi-step chains10+ step workflows with dependenciesModels lose track of state
Web interactionBrowse sites, fill forms, extract dynamic contentNeeds real browser automation + strong reasoning
Novel problem-solving“Figure out why X isn’t working”Requires broad knowledge + creative inference
Long-running orchestrationOvernight batch with 14 sub-agentsContext accumulates, errors compound
Ambiguous instructions“Make this better”Needs inference about intent

The Honest Assessment

Running OpenClaw 100% local on a 32B model is like having a reliable but junior employee. Give it clear instructions and well-defined tasks, and it performs. Give it vague direction or complex problems, and it flounders.

The power users running Claude Opus get the “wow” demos — agents that call restaurants, build entire websites, and orchestrate 14 parallel sub-agents overnight. Local models aren’t there yet. They might never be. The gap between a 32B parameter model and a frontier API model is real.

But “not Claude Opus” isn’t the same as “useless.” A locally-running agent that handles file organization, code generation, and scheduled tasks is useful, especially when it costs nothing to run.


Troubleshooting

“Model not found” Error

OpenClaw can’t find your Ollama model. Check:

# List available models
ollama list

# Verify the model ID in openclaw.json matches exactly
# Wrong: "qwen3-32b"
# Right: "qwen3:32b"

Slow Responses

Agent tasks can take 30-60 seconds per response on 32B models. This is normal for local inference. If responses take minutes:

  • Check nvidia-smi. Is the model actually on the GPU?
  • Reduce num_ctx. Large context windows slow inference.
  • Try a smaller model. 14B responds 2-3x faster than 32B.

Out of Memory (OOM)

The model is too large for your VRAM, or the context window is too high:

# Check VRAM usage
nvidia-smi

# Reduce context in your modelfile
PARAMETER num_ctx 8192

# Or switch to a smaller model
ollama pull qwen3:14b

Agent Gets Stuck in Loops

Local models sometimes repeat themselves or get stuck. Two fixes:

  1. Increase repeat_penalty to 1.2-1.3 in your modelfile
  2. Add explicit instructions in your agent’s memory: “If a task fails twice with the same approach, stop and report the failure instead of retrying.”

Ollama Crashes Under Load

Long agent sessions can exhaust system resources:

# Monitor during agent runs
watch -n 1 nvidia-smi

# Set Ollama to release VRAM when idle
export OLLAMA_KEEP_ALIVE=5m

The KEEP_ALIVE setting unloads the model after 5 minutes of inactivity, freeing VRAM. The next request takes a few seconds longer while the model reloads.


Local vs API: When to Switch

Running local isn’t always the right call. Here’s a framework:

FactorGo LocalUse API
BudgetCan’t afford $15-30/month$30/month is nothing
PrivacyData can’t leave your machineCloud providers are fine
Task complexityRoutine, well-defined tasksComplex, novel problems
ReliabilityAcceptable if some tasks failMust succeed first try
SpeedCan wait 30-60s per responseNeed 2-5s responses
HardwareHave 24GB+ VRAMNo GPU or under 16GB

The hybrid approach from our token optimization guide — local for heartbeats and simple tasks, API for complex work. That’s the practical sweet spot for most users. Going 100% local is for people who either can’t or won’t use cloud APIs.


Cost Comparison: One Year

SetupYear 1 CostYear 2 Cost3-Year Total
OpenClaw + Opus (default)$1,080-1,800$1,080-1,800$3,240-5,400
OpenClaw + Sonnet (optimized)$180-360$180-360$540-1,080
OpenClaw + Ollama (100% local)$700-900 (GPU) + ~$30 electricity~$30 electricity$760-960
OpenClaw + Ollama (already have GPU)~$30 electricity~$30 electricity~$90

If you already own a 24GB GPU, going local is obviously cheaper from day one. If you need to buy one, the used RTX 3090 pays for itself in 4-8 months versus even optimized API usage — and the GPU has resale value when you’re done.


Bottom Line

Running OpenClaw 100% local works. Not as impressively as Claude Opus, but well enough for a real daily-use agent that handles file management, code generation, and scheduled tasks. The setup takes 15-20 minutes, costs nothing after the initial hardware purchase, and kills the entire “my agent drained my API balance overnight” problem.

The recipe:

  1. Install Ollama, pull Qwen 3 32B (24GB VRAM) or the best model your hardware supports
  2. Point openclaw.json at http://127.0.0.1:11434/v1
  3. Create a custom modelfile with agent-optimized settings
  4. Accept the capability tradeoffs: clear instructions, well-defined tasks, patience with slower responses

If you have 24GB VRAM and want an always-on agent that doesn’t bill you, this is the setup. If you need Claude-tier reasoning, keep the API key and use our tiered routing approach instead.