Home Assistant + Local LLM: Voice Control Your Smart Home Without the Cloud
Every time you say “Hey Alexa, turn off the lights,” that audio goes to Amazon’s servers, gets processed, and comes back. Same with Google Home. Same with Siri. Your smart home runs through someone else’s computer.
Home Assistant has been the escape hatch from cloud-dependent smart homes for years. It controls your lights, locks, climate, and media players from a box on your own network. The missing piece was natural language – you could automate anything, but you had to speak in rigid command syntax or tap through dashboards.
That changed when Home Assistant added native Ollama integration in 2024.4 and built the Wyoming voice pipeline. You can now say “turn off the living room lights and set the bedroom to 68 degrees” to a local LLM running on hardware in your house. No cloud. No subscription. No recordings sent anywhere.
What you need (the stack)
The full local voice pipeline has four layers, and each one runs on your network:
| Layer | Component | What it does |
|---|---|---|
| Wake word | openWakeWord / microWakeWord | Listens for “Hey Nabu” (or custom trigger) |
| Speech-to-text | Whisper or Speech-to-Phrase | Converts your voice to text |
| Conversation agent | Ollama (local LLM) | Understands intent, decides what to do |
| Text-to-speech | Piper | Speaks the response back |
Home Assistant’s Wyoming protocol ties these together. It’s a peer-to-peer protocol that standardizes communication between STT, TTS, and wake word engines. Each component can run on the same machine or a separate one on your network.
The two STT options have different tradeoffs:
- Speech-to-Phrase is a close-ended model that only transcribes commands it knows. Under one second on a Raspberry Pi 4. Fast, but limited to predefined sentence structures.
- Whisper is open-ended. It’ll transcribe anything you say. But it takes ~8 seconds on a Pi 4 and under 1 second on an Intel NUC or GPU. If you’re using an LLM as the conversation agent, Whisper is what you want – the LLM handles intent parsing anyway.
Piper generates speech locally. It produces about 1.6 seconds of voice per second of processing on a Pi 4 with medium-quality voices, and since Voice Chapter 10 (June 2025), it streams audio as soon as the LLM produces the first few words. That brought response times down roughly 9.5x for local TTS – from 5.3 seconds to about 0.56 seconds.
Connecting Ollama to Home Assistant
The official Ollama integration is built into Home Assistant. No custom components needed.
Setup:
- Run Ollama on any machine on your network (or the same box as HA)
- In Home Assistant: Settings > Devices & Services > Add Integration > Ollama
- Enter your Ollama server URL (e.g.,
http://192.168.1.50:11434) - Pick a model – it downloads automatically
- Enable “Control Home Assistant” (experimental) to give the LLM access to the Assist API
Once connected, the LLM is your conversation agent. You can assign it to your voice pipeline under Settings > Voice Assistants, replacing the default rule-based Assist agent.
The “Control Home Assistant” toggle is the important part. When enabled, the LLM gets access to your exposed entities through the Assist API. It can turn on lights, set thermostats, check sensor states, trigger scenes, and run scripts. You control which entities are exposed from the Settings > Voice Assistants > Expose page.
Home Assistant’s AI architecture uses a two-tier approach: the native Assist agent handles commands it recognizes first, then passes anything unrecognized to the LLM. This means simple commands like “turn off the lights” resolve instantly through pattern matching, while “what’s the temperature in every room?” hits the LLM. Since September 2025, the LLM can also initiate conversations – your garage door has been open for an hour, the LLM can ask if you want to close it.
One limitation: only models that support tool calling can control Home Assistant. The model needs to output structured function calls, not just natural language. This narrows your model options.
Which models work
This is where most guides stop. “Just connect Ollama!” doesn’t help if your model can’t reliably parse “set the thermostat to 72” into a structured API call. I tested several models for Home Assistant tool calling.
Recommended models
| Model | Size | Tool calling | Speed (RTX 3060 12GB) | Verdict |
|---|---|---|---|---|
| Qwen 3.5 9B Q4 | 6.6GB VRAM | Reliable | ~15 tok/s | Best balance of speed and reliability |
| Qwen 2.5 7B Q4 | 5.5GB VRAM | Reliable | ~18 tok/s | Proven, slightly less capable |
| Llama 3.1 8B Q4 | 5.5GB VRAM | Good | ~17 tok/s | Solid fallback |
| home-3b-v3 | ~2GB VRAM | Purpose-built | ~25 tok/s | Trained specifically for HA, works on Pi |
| Phi-4 Mini 3.8B Q4 | ~3GB VRAM | Decent | ~22 tok/s | Fast, sometimes misparses complex commands |
Qwen 3.5 9B is my pick for anyone with a GPU or 16GB+ Apple Silicon. It handles multi-entity commands (“turn off all the lights downstairs except the hallway”) reliably, and its tool calling accuracy is high. The 6.6GB VRAM footprint leaves room for Whisper to run on the same GPU.
home-3b-v3 deserves special mention. The Home LLM project trained small models specifically for smart home control. These models are fine-tuned on Home Assistant service calls – they know the difference between light.turn_on and climate.set_temperature because that’s what they were trained on. Version 0.4+ supports proper tool calling with an agentic loop. If you’re running on a Raspberry Pi or any CPU-only setup, this is your best option.
Models to avoid for Home Assistant
General-purpose chat models without tool calling support won’t work for device control. They’ll happily tell you about smart home protocols but can’t actually call light.turn_off. Check the Ollama model page for “Tools” support before downloading.
Reasoning models (DeepSeek-R1 distills, QwQ) are overkill. They’ll think for 30 seconds about whether to turn on a light. You want fast, reliable function calling, not philosophical deliberation.
Hardware options
You don’t need much. The LLM is the heaviest component, and it doesn’t need to be fast – 10 tok/s feels instant when the response is “OK, turning off the kitchen lights.”
Option 1: Dedicated mini PC ($50-150)
A refurbished Lenovo M710Q or Dell Optiplex Micro from eBay for $50-80. Install Ubuntu, run Ollama, and point Home Assistant at it. The home-3b-v3 model runs on CPU at ~5-8 tok/s on these machines. Add Whisper and Piper on the same box.
This is the cheapest dedicated path. The M710Q draws 10-15W at idle, so running it 24/7 costs about $15/year in electricity.
Option 2: Raspberry Pi 5 ($80-100)
If you’re already running Home Assistant on a Pi, you can run everything on the same device. Speech-to-Phrase transcribes in under a second. Piper generates voice. The home-3b model handles basic commands. Whisper is too slow on a Pi for comfortable use (8+ seconds) – use Speech-to-Phrase instead.
The Pi path works for simple commands: lights, switches, scenes. It struggles with multi-entity requests or anything requiring the model to reason about device states.
Option 3: Existing desktop or server
If you have a GPU-equipped machine on your network, run Ollama there and point Home Assistant at it. Qwen 3.5 9B on a GPU gives you 2-3 second total response time from voice command to spoken confirmation. This is the setup that actually feels like a smart speaker replacement.
A Mac Mini with 16GB+ unified memory also works well here. Qwen 3.5 9B runs at ~25-30 tok/s on M4 hardware, and the Mac handles Whisper and Piper simultaneously.
Hardware summary
| Setup | Cost | Best model | Response time | Good enough for |
|---|---|---|---|---|
| Refurb mini PC (CPU) | $50-80 | home-3b-v3 | 4-8 sec | Lights, switches, simple scenes |
| Raspberry Pi 5 | $80-100 | home-3b-v3 | 5-10 sec | Basic commands only |
| Desktop with GPU | $0 (existing) | Qwen 3.5 9B | 2-4 sec | Everything including multi-entity |
| Mac Mini M4 16GB | $600 | Qwen 3.5 9B | 2-4 sec | Everything, low power draw |
Setting up the voice pipeline
Here’s the quickest path to a working local voice stack, assuming you already have Home Assistant running.
1. Install Ollama on your chosen hardware:
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3.5:9b-q4_K_M
If using a Pi or CPU-only box:
ollama pull fixt/home-3b-v3
2. Install Whisper and Piper as Home Assistant add-ons:
- Go to Settings > Add-ons > Add-on Store
- Install “Whisper” (or “Speech-to-Phrase” for Pi)
- Install “Piper”
- Start both add-ons
3. Add the Ollama integration:
- Settings > Devices & Services > Add Integration > Ollama
- Enter your Ollama URL
- Select your model
- Enable “Control Home Assistant”
4. Configure the voice pipeline:
- Settings > Voice Assistants > Add Assistant
- Conversation agent: your Ollama integration
- Speech-to-text: Whisper (or Speech-to-Phrase)
- Text-to-speech: Piper
- Wake word: openWakeWord (optional)
5. Expose entities you want the LLM to control:
- Settings > Voice Assistants > Expose tab
- Select the lights, switches, climate devices, and scenes you want controllable
6. Add a voice device (optional but recommended):
- A $13 ATOM Echo flashed with ESPHome firmware works as a room speaker
- Or use the Home Assistant app’s microphone on your phone
- The Home Assistant Voice Preview Edition ($59) is purpose-built for this
What works and what doesn’t
After running this setup for a few weeks, here’s an honest breakdown.
Works well:
- Single-entity commands: “Turn off the kitchen lights” – instant, reliable
- Temperature control: “Set the bedroom to 68” – works consistently
- Scene activation: “Movie time” (triggers a scene) – fast and clean
- State queries: “Is the garage door open?” – reads entity states accurately
- Follow-up conversations: “Turn on the lights.” “Which ones?” “Living room.” – since Voice Chapter 10, the LLM keeps context without re-triggering the wake word
Works, but rough around the edges:
- Multi-entity commands: “Turn off everything downstairs except the hallway” – larger models (9B+) handle this, 3B models sometimes miss entities
- Time-based requests: “Turn on the porch light in 20 minutes” – requires script or automation creation, which not all models handle reliably
- Device discovery: “What lights are on?” when you have 50+ exposed entities – context fills up fast
Don’t bother yet:
- Complex automation creation via voice: “Every weekday at 7am, gradually raise the bedroom lights over 15 minutes” – the model can’t write YAML automations reliably
- Multi-room audio routing: “Play jazz in the kitchen” – media player control via LLM is flaky
- Security-critical actions: Don’t let the LLM unlock your front door. Use HA’s native confirmed actions for locks and alarms.
The honest take
It works. For lights, climate, and scenes, a local LLM with Home Assistant is a genuine Alexa replacement. The response time with a GPU is 2-4 seconds, which is comparable to cloud assistants. And nothing leaves your network.
But don’t rip out your Echo Dots yet. Cloud assistants still win on wake word accuracy (openWakeWord has more false positives), multi-room audio coordination, and handling ambiguous requests gracefully. Run them side by side. Use the local assistant for the stuff you care about keeping private, and let Alexa handle the music and timers until the local stack catches up.
The local stack improves monthly. Streaming TTS cut latency by 9.5x in a single release. Speech-to-Phrase went from 6 to 21 languages. If you have the hardware sitting around, set it up on a Saturday afternoon. A year ago this was a novelty. Now it’s a usable voice assistant.
Related guides
Get notified when we publish new guides.
Subscribe — free, no spam