TTS
Mistral Voxtral TTS: Open-Weight Voice AI You Can Run Locally
Voxtral TTS is a 4B open-weight text-to-speech model that beats ElevenLabs Flash v2.5 in blind tests. 70ms latency, 9 languages, voice cloning from 3 seconds. Here's how to run it.
Crane + Qwen3-TTS: Run Voice Cloning Locally with Rust
Clone any voice with 3 seconds of audio using Qwen3-TTS through Crane's pure Rust inference engine. ~4GB VRAM, faster than real-time, Apache 2.0.
Building a Local AI Assistant: Your Private Jarvis
Build a private AI assistant with Ollama, Open WebUI, Whisper, and Kokoro TTS. Voice chat, document Q&A, home automation — all local, no cloud, no subscriptions.
Talk to Your Local LLM: Voice Chat Setup
Under 1 second response time with Whisper + Kokoro TTS + your local model. Full setup guide for Open WebUI voice chat and standalone options. Needs 2-4GB VRAM.