Home Assistant + Local LLM: Voice Control Your Smart Home Without the Cloud

Every time you say “Hey Alexa, turn off the lights,” that audio goes to Amazon’s servers, gets processed, and comes back. Same with Google Home. Same with Siri. Your smart home runs through someone else’s computer.

Home Assistant has been the escape hatch from cloud-dependent smart homes for years. It controls your lights, locks, climate, and media players from a box on your own network. The missing piece was natural language – you could automate anything, but you had to speak in rigid command syntax or tap through dashboards.

That changed when Home Assistant added native Ollama integration in 2024.4 and built the Wyoming voice pipeline. You can now say “turn off the living room lights and set the bedroom to 68 degrees” to a local LLM running on hardware in your house. No cloud. No subscription. No recordings sent anywhere.

What you need (the stack)

The full local voice pipeline has four layers, and each one runs on your network:

Layer	Component	What it does
Wake word	openWakeWord / microWakeWord	Listens for “Hey Nabu” (or custom trigger)
Speech-to-text	Whisper or Speech-to-Phrase	Converts your voice to text
Conversation agent	Ollama (local LLM)	Understands intent, decides what to do
Text-to-speech	Piper	Speaks the response back

Home Assistant’s Wyoming protocol ties these together. It’s a peer-to-peer protocol that standardizes communication between STT, TTS, and wake word engines. Each component can run on the same machine or a separate one on your network.

The two STT options have different tradeoffs:

Speech-to-Phrase is a close-ended model that only transcribes commands it knows. Under one second on a Raspberry Pi 4. Fast, but limited to predefined sentence structures.
Whisper is open-ended. It’ll transcribe anything you say. But it takes ~8 seconds on a Pi 4 and under 1 second on an Intel NUC or GPU. If you’re using an LLM as the conversation agent, Whisper is what you want – the LLM handles intent parsing anyway.

Piper generates speech locally. It produces about 1.6 seconds of voice per second of processing on a Pi 4 with medium-quality voices, and since Voice Chapter 10 (June 2025), it streams audio as soon as the LLM produces the first few words. That brought response times down roughly 9.5x for local TTS – from 5.3 seconds to about 0.56 seconds.

Connecting Ollama to Home Assistant

The official Ollama integration is built into Home Assistant. No custom components needed.

Setup:

Run Ollama on any machine on your network (or the same box as HA)
In Home Assistant: Settings > Devices & Services > Add Integration > Ollama
Enter your Ollama server URL (e.g., http://192.168.1.50:11434)
Pick a model – it downloads automatically
Enable “Control Home Assistant” (experimental) to give the LLM access to the Assist API

Once connected, the LLM is your conversation agent. You can assign it to your voice pipeline under Settings > Voice Assistants, replacing the default rule-based Assist agent.

The “Control Home Assistant” toggle is the important part. When enabled, the LLM gets access to your exposed entities through the Assist API. It can turn on lights, set thermostats, check sensor states, trigger scenes, and run scripts. You control which entities are exposed from the Settings > Voice Assistants > Expose page.

Home Assistant’s AI architecture uses a two-tier approach: the native Assist agent handles commands it recognizes first, then passes anything unrecognized to the LLM. This means simple commands like “turn off the lights” resolve instantly through pattern matching, while “what’s the temperature in every room?” hits the LLM. Since September 2025, the LLM can also initiate conversations – your garage door has been open for an hour, the LLM can ask if you want to close it.

One limitation: only models that support tool calling can control Home Assistant. The model needs to output structured function calls, not just natural language. This narrows your model options.

Which models work

This is where most guides stop. “Just connect Ollama!” doesn’t help if your model can’t reliably parse “set the thermostat to 72” into a structured API call. I tested several models for Home Assistant tool calling.

Recommended models

Model	Size	Tool calling	Speed (RTX 3060 12GB)	Verdict
Qwen 3.5 9B Q4	6.6GB VRAM	Reliable	~15 tok/s	Best balance of speed and reliability
Qwen 2.5 7B Q4	5.5GB VRAM	Reliable	~18 tok/s	Proven, slightly less capable
Llama 3.1 8B Q4	5.5GB VRAM	Good	~17 tok/s	Solid fallback
home-3b-v3	~2GB VRAM	Purpose-built	~25 tok/s	Trained specifically for HA, works on Pi
Phi-4 Mini 3.8B Q4	~3GB VRAM	Decent	~22 tok/s	Fast, sometimes misparses complex commands

Qwen 3.5 9B is my pick for anyone with a GPU or 16GB+ Apple Silicon. It handles multi-entity commands (“turn off all the lights downstairs except the hallway”) reliably, and its tool calling accuracy is high. The 6.6GB VRAM footprint leaves room for Whisper to run on the same GPU.

home-3b-v3 deserves special mention. The Home LLM project trained small models specifically for smart home control. These models are fine-tuned on Home Assistant service calls – they know the difference between light.turn_on and climate.set_temperature because that’s what they were trained on. Version 0.4+ supports proper tool calling with an agentic loop. If you’re running on a Raspberry Pi or any CPU-only setup, this is your best option.

Models to avoid for Home Assistant

General-purpose chat models without tool calling support won’t work for device control. They’ll happily tell you about smart home protocols but can’t actually call light.turn_off. Check the Ollama model page for “Tools” support before downloading.

Reasoning models (DeepSeek-R1 distills, QwQ) are overkill. They’ll think for 30 seconds about whether to turn on a light. You want fast, reliable function calling, not philosophical deliberation.

Hardware options

You don’t need much. The LLM is the heaviest component, and it doesn’t need to be fast – 10 tok/s feels instant when the response is “OK, turning off the kitchen lights.”

Option 1: Dedicated mini PC ($50-150)

A refurbished Lenovo M710Q or Dell Optiplex Micro from eBay for $50-80. Install Ubuntu, run Ollama, and point Home Assistant at it. The home-3b-v3 model runs on CPU at ~5-8 tok/s on these machines. Add Whisper and Piper on the same box.

This is the cheapest dedicated path. The M710Q draws 10-15W at idle, so running it 24/7 costs about $15/year in electricity.

Option 2: Raspberry Pi 5 ($80-100)

If you’re already running Home Assistant on a Pi, you can run everything on the same device. Speech-to-Phrase transcribes in under a second. Piper generates voice. The home-3b model handles basic commands. Whisper is too slow on a Pi for comfortable use (8+ seconds) – use Speech-to-Phrase instead.

The Pi path works for simple commands: lights, switches, scenes. It struggles with multi-entity requests or anything requiring the model to reason about device states.

Option 3: Existing desktop or server

If you have a GPU-equipped machine on your network, run Ollama there and point Home Assistant at it. Qwen 3.5 9B on a GPU gives you 2-3 second total response time from voice command to spoken confirmation. This is the setup that actually feels like a smart speaker replacement.

A Mac Mini with 16GB+ unified memory also works well here. Qwen 3.5 9B runs at ~25-30 tok/s on M4 hardware, and the Mac handles Whisper and Piper simultaneously.

Hardware summary

Setup	Cost	Best model	Response time	Good enough for
Refurb mini PC (CPU)	$50-80	home-3b-v3	4-8 sec	Lights, switches, simple scenes
Raspberry Pi 5	$80-100	home-3b-v3	5-10 sec	Basic commands only
Desktop with GPU	$0 (existing)	Qwen 3.5 9B	2-4 sec	Everything including multi-entity
Mac Mini M4 16GB	$600	Qwen 3.5 9B	2-4 sec	Everything, low power draw

Setting up the voice pipeline

Here’s the quickest path to a working local voice stack, assuming you already have Home Assistant running.

1. Install Ollama on your chosen hardware:

curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3.5:9b-q4_K_M

If using a Pi or CPU-only box:

ollama pull fixt/home-3b-v3

2. Install Whisper and Piper as Home Assistant add-ons:

Go to Settings > Add-ons > Add-on Store
Install “Whisper” (or “Speech-to-Phrase” for Pi)
Install “Piper”
Start both add-ons

3. Add the Ollama integration:

Settings > Devices & Services > Add Integration > Ollama
Enter your Ollama URL
Select your model
Enable “Control Home Assistant”

4. Configure the voice pipeline:

Settings > Voice Assistants > Add Assistant
Conversation agent: your Ollama integration
Speech-to-text: Whisper (or Speech-to-Phrase)
Text-to-speech: Piper
Wake word: openWakeWord (optional)

5. Expose entities you want the LLM to control:

Settings > Voice Assistants > Expose tab
Select the lights, switches, climate devices, and scenes you want controllable

6. Add a voice device (optional but recommended):

A $13 ATOM Echo flashed with ESPHome firmware works as a room speaker
Or use the Home Assistant app’s microphone on your phone
The Home Assistant Voice Preview Edition ($59) is purpose-built for this

What works and what doesn’t

After running this setup for a few weeks, here’s an honest breakdown.

Works well:

Single-entity commands: “Turn off the kitchen lights” – instant, reliable
Temperature control: “Set the bedroom to 68” – works consistently
Scene activation: “Movie time” (triggers a scene) – fast and clean
State queries: “Is the garage door open?” – reads entity states accurately
Follow-up conversations: “Turn on the lights.” “Which ones?” “Living room.” – since Voice Chapter 10, the LLM keeps context without re-triggering the wake word

Works, but rough around the edges:

Multi-entity commands: “Turn off everything downstairs except the hallway” – larger models (9B+) handle this, 3B models sometimes miss entities
Time-based requests: “Turn on the porch light in 20 minutes” – requires script or automation creation, which not all models handle reliably
Device discovery: “What lights are on?” when you have 50+ exposed entities – context fills up fast

Don’t bother yet:

Complex automation creation via voice: “Every weekday at 7am, gradually raise the bedroom lights over 15 minutes” – the model can’t write YAML automations reliably
Multi-room audio routing: “Play jazz in the kitchen” – media player control via LLM is flaky
Security-critical actions: Don’t let the LLM unlock your front door. Use HA’s native confirmed actions for locks and alarms.

The honest take

It works. For lights, climate, and scenes, a local LLM with Home Assistant is a genuine Alexa replacement. The response time with a GPU is 2-4 seconds, which is comparable to cloud assistants. And nothing leaves your network.

But don’t rip out your Echo Dots yet. Cloud assistants still win on wake word accuracy (openWakeWord has more false positives), multi-room audio coordination, and handling ambiguous requests gracefully. Run them side by side. Use the local assistant for the stuff you care about keeping private, and let Alexa handle the music and timers until the local stack catches up.

The local stack improves monthly. Streaming TTS cut latency by 9.5x in a single release. Speech-to-Phrase went from 6 to 21 languages. If you have the hardware sitting around, set it up on a Saturday afternoon. A year ago this was a novelty. Now it’s a usable voice assistant.