Local AI for Privacy: What's Actually Private

“Run it locally and your data stays private.”

You’ve seen this on every Reddit thread, every Hacker News comment, every local AI tutorial. And it’s mostly true. But “mostly” is doing a lot of work when the reason you went local was to keep sensitive documents away from corporate servers.

Here’s the actual picture: what’s genuinely private, what still leaks, and how to lock it all down.

What’s Genuinely Private

When you run a model locally with Ollama, LM Studio, or llama.cpp, these things stay on your machine with zero exceptions:

Your prompts. Every question, every document you paste, every conversation — none of it leaves your hardware. No API call, no round trip to a datacenter, no server-side log.

Your responses. The model generates output using your CPU or GPU. No copy exists on anyone else’s infrastructure.

Your documents. When you use local RAG to search your files, those files stay on your disk. Embeddings are generated locally. The vector database lives on your machine.

Your usage patterns. No one knows what models you run, when you run them, how often, or what topics you explore. There’s no analytics dashboard on someone else’s server tracking your behavior.

No account required. Ollama doesn’t need a login. LM Studio doesn’t need a login. llama.cpp definitely doesn’t need a login. You can run inference without ever giving anyone your email address.

Works offline. Once you’ve downloaded a model, disconnect from the internet entirely and everything still works. That’s air-gapped privacy — the kind you can’t get from any cloud service.

Compare that to what happens when you use cloud AI:

	Local AI	ChatGPT (Free)	Claude (Free/Pro)	Gemini (Free)
Prompts sent to servers	No	Yes	Yes	Yes
Trains on your data (default)	No	Yes	Yes (since Sep 2025)	Yes
Can opt out of training	N/A	Yes	Yes	Yes
Human reviewers can see chats	No	Yes	Possible	Yes
Data retention	You control	30 days	Up to 5 years	Up to 3 years
Account required	No	Yes	Yes	Yes
Works offline	Yes	No	No	No

The bottom line: local AI eliminates the single biggest privacy risk — sending your data to a third party who stores it, trains on it, or lets employees read it.

What’s NOT Automatically Private

Here’s where the “just run it locally” advice falls short. Running locally solves the biggest problem, but these gaps still exist.

Model Downloads Expose Your IP

Every model you pull contacts a server. ollama pull llama3.2 hits registry.ollama.ai. Downloading from Hugging Face hits huggingface.co. The hosting provider sees your IP address and which model you’re downloading.

For most people this doesn’t matter. But if you’re trying to keep your AI usage invisible, plan your downloads separately from your inference work — use a VPN, download on a different network, or transfer models via USB from another machine.

Telemetry and Update Checks

Ollama contacts registry.ollama.ai on startup for update checks. The core runtime has no telemetry — your prompts never leave your machine. But the update check reveals your IP and that you’re running Ollama. There’s no built-in flag to disable this yet (it’s an open feature request). The workaround is to block registry.ollama.ai in your firewall or /etc/hosts.

LM Studio contacts its servers when you search for models, download models, or check for software updates. Outside of those three actions, it makes no outbound connections. No analytics, no tracking, no login required.

ComfyUI’s desktop app had a telemetry problem. Version 0.4.41 fixed “rogue remote telemetry” that was sending activity data even when users had toggled the setting off. The core open-source ComfyUI code has no telemetry. If you use the desktop app, update to the latest version and verify the toggle is off.

llama.cpp makes zero outbound connections. Period. No update checks, no telemetry, no network code in the inference engine. If you want the gold standard for verified-private inference, this is it.

Web Search Features Phone Home

Some local AI front-ends include web search — Open WebUI can search the web to augment responses, and various RAG tools include web retrieval. The moment web search activates, your queries leave your machine and hit search APIs. This is by design, but it breaks the air-gap.

If privacy is the goal, disable web search features or use them only for non-sensitive queries.

VS Code AI Extensions: The Real Threat

This one is worse than telemetry. In January 2026, researchers discovered two VS Code extensions masquerading as AI coding assistants — “ChatGPT – 中文版” and “ChatMoss (CodeMoss)” — that had been installed by 1.5 million developers. These extensions:

Captured every file you opened in VS Code, not just files you interacted with
Base64-encoded the contents and sent them to servers in China
Could remotely trigger harvesting of up to 50 files from your workspace
Embedded hidden tracking iframes with four Chinese analytics SDKs
Exfiltrated .env files, API keys, SSH keys, credentials, and source code

The extensions actually worked as AI assistants, which is why the malicious behavior went undetected for months.

This wasn’t a one-off. Malicious VS Code extension detections grew from 27 in all of 2024 to 105 in the first ten months of 2025. The VS Code marketplace has a real supply chain security problem.

The lesson: Your local LLM can be perfectly private, but if you’ve installed a sketchy VS Code extension alongside it, your code is leaking anyway. Audit your extensions. Check publishers. Remove anything you don’t actively use.

“Local” Tools with Cloud Fallbacks

Some tools that market themselves as “local” include cloud API fallbacks. If the local model fails or takes too long, they silently route your prompt to a cloud endpoint. Read the docs. Check the settings. If a tool supports multiple “providers” including cloud APIs, make sure the cloud option is disabled — not just not-selected, but actually disabled.

Tools That Respect Privacy

Here’s what you can trust and what to watch for:

Tool	Telemetry	Update Checks	Network During Inference	Privacy Verdict
llama.cpp	None	None	None	Best possible
Ollama	None	Yes (startup)	None	Great, block update checks for max privacy
LM Studio	None	Yes (launch)	None	Great, download models manually for max privacy
ComfyUI (core)	None	None	None	Great for local image gen
ComfyUI (desktop)	Opt-in (had a bug)	Yes	None	Good, verify telemetry toggle
Open WebUI	None	None	Only if web search enabled	Good, disable web search for privacy

For maximum privacy: llama.cpp with a pre-downloaded GGUF model is unbeatable. Zero network code means zero possible data leakage during inference. The tradeoff is a command-line interface and manual setup.

For practical privacy: Ollama with update checks blocked gives you an easy-to-use tool with no data leakage. Add a front-end like Open WebUI bound to localhost and you get a ChatGPT-like interface that’s genuinely private.

Tools to Audit

Before trusting any tool with sensitive work, check for these:

Cloud sync features. If a note-taking app or RAG tool offers “sync across devices,” your data is hitting a server. Turn it off or use a tool that doesn’t have it.

Update checks on launch. Most GUI tools ping a server on startup. This reveals your IP and that you’re using the tool. Usually harmless, but block it if you need to.

RAG tools with built-in web search. AnythingLLM, Open WebUI, and others can search the web during retrieval. That’s useful, but it means your questions leave your machine. Disable it for sensitive queries.

VS Code extensions. After the MaliciousCorgi incident, treat every AI extension as suspect until proven otherwise. Check:

Is the publisher verified?
Is the source code available?
Does it request network permissions?
When was it last updated?
What permissions does it request?

GitHub Copilot. On the free tier, your code may be used for model improvement. Business and Enterprise plans don’t train on your data, but your code still goes to GitHub’s servers for inference. If the code can’t leave your machine, Copilot isn’t an option — use a local coding model instead.

Your Threat Model Matters

Not everyone needs the same level of privacy. What are you actually protecting against?

Hiding from Big Tech Data Collection

Threat: You don’t want OpenAI, Google, or Anthropic training on your prompts, storing your conversations, or letting employees read them.

Solution: Any local setup works. Even Ollama with default settings keeps your prompts completely off corporate servers. This is the easiest threat to address and the most common reason people go local.

Hiding from Network Snoopers

Threat: Someone on your local network (IT department, shared WiFi, ISP) can see your traffic patterns.

Solution: They can see that you downloaded models from Hugging Face, but they can’t see your prompts or responses (those never leave your machine). For model downloads, use a VPN. For inference, there’s nothing to intercept — it’s all local computation. Enable full disk encryption so your data is protected if the machine is physically accessed.

Hiding from Nation-State Adversaries

Threat: Advanced persistent threats, hardware implants, supply chain compromises.

Solution: Different conversation entirely. Air-gapped hardware, verified firmware, hardware security modules, TEMPEST shielding. If this is your threat model, you’re not reading a blog post for advice — you have a security team. But the fundamentals still apply: local inference on air-gapped hardware with pre-loaded models is as good as it gets for the AI layer.

Practical Privacy Setup

Five steps to lock down a local AI setup. Takes about 15 minutes.

1. Use Ollama + Terminal (Minimal Attack Surface)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model while online
ollama pull qwen2.5:14b

# Bind to localhost only
export OLLAMA_HOST=127.0.0.1:11434

The terminal has the smallest attack surface of any interface. No browser extensions, no Electron app, no hidden iframes.

2. Pre-Download Models, Then Work Offline

# Download everything you need while connected
ollama pull qwen2.5:14b
ollama pull llama3.2:3b
ollama pull nomic-embed-text

# Disconnect from the internet
# On Linux:
nmcli networking off

# Now run inference — everything works
ollama run qwen2.5:14b

See our complete offline guide for portable setups and model transfer via USB.

3. Disable Telemetry in GUI Tools

LM Studio: Download models manually from Hugging Face and load from disk. This avoids the model search network call entirely.

ComfyUI Desktop: Settings → verify “Send anonymous usage metrics” is toggled off. Update to 0.4.41+ to fix the rogue telemetry bug.

Open WebUI: Disable web search in settings if you don’t need it.

4. Firewall Outbound Connections During Use

# Block Ollama's update checks
echo "127.0.0.1 registry.ollama.ai" | sudo tee -a /etc/hosts

# Or use ufw to block all outbound during sensitive work
sudo ufw default deny outgoing
sudo ufw enable

# When you need internet again
sudo ufw default allow outgoing

For a more surgical approach, block outbound per-application using iptables:

# Block outbound for a specific user running Ollama
sudo iptables -A OUTPUT -m owner --uid-owner ollama -d 127.0.0.1 -j ACCEPT
sudo iptables -A OUTPUT -m owner --uid-owner ollama -j DROP

5. Full Disk Encryption

If someone gets physical access to your machine, all the local inference in the world doesn’t help if your disk is unencrypted.

Linux: LUKS (set up during OS install, or use cryptsetup)
macOS: FileVault (System Preferences → Security & Privacy)
Windows: BitLocker (Pro/Enterprise) or VeraCrypt (Home)

This protects your model files, conversation logs, RAG databases, and any documents you’ve been working with.

When Local Is Non-Negotiable

Some data should never touch a cloud AI service, regardless of their privacy policy:

Legal documents. Client communications, contracts, case files. Attorney-client privilege doesn’t survive sending documents to OpenAI’s servers, even with training opt-out.

Medical notes. Patient records, clinical notes, diagnostic discussions. HIPAA doesn’t care that you opted out of training — the data still left your control.

Proprietary code. Trade secrets, unreleased features, security-sensitive implementations. Use a local coding model or self-host behind your firewall.

Personal journals and private writing. Anything you’d be uncomfortable seeing in a data breach notification. Cloud providers get breached. Local machines only get breached if someone targets you specifically.

Financial data. Tax documents, bank statements, investment strategies. Cloud AI terms of service are not a substitute for financial data security.

When Cloud Is Fine

Not everything needs a privacy fortress. Use cloud AI freely for:

General knowledge questions. “How does TCP handshaking work?” isn’t sensitive.
Public code help. Debugging open-source code that’s already on GitHub.
Non-sensitive chat. Brainstorming blog post ideas, getting recipe suggestions.
Anything you’d post publicly. If you’d put it on Stack Overflow or Reddit, it’s fine to put in ChatGPT.

The right approach is to match the tool to the sensitivity. Use ChatGPT or Claude for casual questions where convenience matters. Switch to local for anything sensitive. The tiered AI model strategy covers this in detail.

The Bottom Line

Local AI gives you genuine privacy that no cloud service can match, no matter what their terms of service say. Your prompts stay local. Your documents stay local. Nobody trains on your data.

But “local” isn’t a magic word. Update checks phone home. Model downloads expose your IP. VS Code extensions have exfiltrated code from 1.5 million developers. ComfyUI shipped telemetry that ignored its own off switch.

The fix is straightforward: know what your tools do on the network, block what you don’t need, and match your setup to your actual threat model. For most people, Ollama with update checks blocked is plenty. For sensitive professional work, go offline during inference. For regulated data, air-gap the machine.

Start with Ollama, pick a model that fits your hardware, and you’re already in a fundamentally better privacy position than 99% of AI users.