WSL2 + Ollama on Windows: Complete Setup Guide (GPU Passthrough Included)
📚 Related: WSL2 for Local AI (Full Guide) · Ollama Troubleshooting · Ollama vs LM Studio · Open WebUI Setup · Planning Tool
Windows has a native Ollama installer. It works. So why bother with WSL2?
Because the moment you want Docker Compose, Open WebUI, Python scripts that call the Ollama API, or a dev environment that matches your deployment server, you’re going to want Linux. WSL2 gives you that without dual-booting, and GPU inference runs at the same speed as native Windows.
Here’s everything: wsl --install to Ollama with GPU acceleration, Open WebUI in your browser, Docker Compose managing the stack, and the gotchas that will eat your afternoon if nobody warns you.
Install WSL2
Open an Administrator PowerShell:
wsl --install
This installs WSL2, the Linux kernel, and Ubuntu. It will prompt for a UNIX username and password on first launch. Restart your machine if prompted.
For a specific version:
wsl --install -d Ubuntu-24.04
Verify WSL2 is active (not WSL1):
wsl --list --verbose
If your distro shows VERSION 1, convert it:
wsl --set-version Ubuntu-24.04 2
Enable systemd
Ollama runs as a systemd service. Without systemd, you’ll get "System has not been booted with systemd as init system" when trying to manage the service.
Inside WSL, edit /etc/wsl.conf:
[boot]
systemd=true
Restart from PowerShell: wsl --shutdown, then reopen your WSL terminal.
Configure memory
WSL2 defaults to 50% of your system RAM. On a 32GB system, that’s 16GB. Fine for 7B models, tight for anything larger.
Create or edit C:\Users\<YourUsername>\.wslconfig:
[wsl2]
memory=24GB
swap=8GB
processors=8
localhostForwarding=true
networkingMode=mirrored
[experimental]
autoMemoryReclaim=dropcache
sparseVhd=true
Restart WSL: wsl --shutdown
| Setting | What It Does |
|---|---|
memory | Leave 4-8GB for Windows, give the rest to WSL2 |
swap | Prevents OOM kills during model loading |
sparseVhd | Stops your virtual disk from ballooning when you download and delete models |
autoMemoryReclaim | dropcache releases RAM when WSL is idle. Don’t use gradual — it conflicts with systemd and can freeze your shell |
networkingMode | mirrored makes services accessible from your LAN. Requires Windows 11 22H2+. Windows 10 users: remove this line |
GPU VRAM is not affected by .wslconfig. Your models get the full GPU memory minus ~200-500MB for the Windows desktop compositor.
GPU passthrough
This is simpler than most guides make it. Two rules:
- Install the NVIDIA driver on Windows only. Your standard GeForce Game Ready or Studio driver (535+ minimum, 560+ recommended).
- Do NOT install any NVIDIA Linux GPU driver inside WSL2. The Windows driver is automatically “stubbed” into WSL2 as
libcuda.so.
Verify GPU access
Inside WSL2:
nvidia-smi
You should see your GPU model, driver version, and CUDA version. If this fails:
- Update your Windows NVIDIA driver
- Run
wsl --updatefrom PowerShell - Run
wsl --shutdownand relaunch
Install CUDA toolkit (optional)
Ollama doesn’t need the CUDA Toolkit because it bundles its own CUDA runtime. But if you’re also building llama.cpp or running PyTorch, install the toolkit with this rule: install cuda-toolkit-12-x only, never the cuda or cuda-drivers meta-packages. Those meta-packages install a Linux driver that overwrites the WSL2 GPU stub and breaks everything.
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6
For details on CUDA setup, llama.cpp, and PyTorch in WSL2, see our full WSL2 local AI guide.
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Same one-liner as native Linux. The install script auto-detects your GPU through the WSL2 CUDA stub.
Verify GPU detection
Pull a model and check:
ollama pull qwen2.5:14b-instruct-q4_K_M
ollama run qwen2.5:14b-instruct-q4_K_M "hello"
While the model is running, open a second WSL terminal:
ollama ps
The PROCESSOR column tells you where inference is happening. You want to see 100% GPU or a GPU percentage. If it says 100% CPU, something is wrong.
More ways to verify:
# Check Ollama's logs for GPU detection
journalctl -u ollama --no-pager | grep -i gpu
# Watch GPU utilization in real time
watch -n 1 nvidia-smi
You should see ollama_llama_server in the nvidia-smi process list with VRAM allocated. A 14B model at Q4 should use roughly 8-9GB of VRAM and generate 30-80+ tokens per second depending on your GPU.
If the GPU isn’t detected
Check these in order:
| Symptom | Fix |
|---|---|
nvidia-smi fails entirely | Update Windows NVIDIA driver, run wsl --update, restart WSL |
nvidia-smi works but Ollama says CPU | Restart Ollama: sudo systemctl restart ollama |
"CUDA error 100" in logs | You probably installed the cuda package. Purge it: sudo apt remove --purge cuda cuda-drivers && sudo apt autoremove |
| Very slow (5-15 tok/s on a modern GPU) | Model is running on CPU. Check ollama ps — Processor should show GPU |
See our Ollama troubleshooting guide for more fixes.
Set up Open WebUI
Open WebUI gives you a ChatGPT-like browser interface for Ollama. The easiest way to run it in WSL2 is Docker.
Install Docker in WSL2
sudo apt update && sudo apt install -y docker.io
sudo systemctl enable --now docker
sudo usermod -aG docker $USER
Log out and back in (or run newgrp docker) for the group change to take effect.
Run it
The simplest approach is host networking, so Open WebUI can reach Ollama at localhost:
docker run -d \
--network=host \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Open http://localhost:8080 in your Windows browser. Create an account (local only, not sent anywhere), and you’ll see your Ollama models.
If host networking doesn’t work (some Docker Desktop configurations):
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Then access at http://localhost:3000.
If Open WebUI can’t connect to Ollama
Make sure Ollama is listening on all interfaces, not just loopback:
sudo systemctl edit ollama.service
Add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Then:
sudo systemctl daemon-reload
sudo systemctl restart ollama
Verify it’s listening:
curl http://localhost:11434
You should see Ollama is running.
Docker Compose setup
If you want Ollama and Open WebUI managed together, Docker Compose handles that.
Install Docker Compose plugin
sudo apt install -y docker-compose-v2
GPU support: install NVIDIA Container Toolkit
This lets Docker containers access your GPU:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Test GPU access in Docker:
docker run --rm --gpus all nvidia/cuda:12.6.0-runtime-ubuntu24.04 nvidia-smi
The docker-compose.yml
Create ~/ollama-stack/docker-compose.yml:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_FLASH_ATTENTION=1
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
ports:
- "8080:8080"
volumes:
- open_webui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
open_webui_data:
Start it:
cd ~/ollama-stack
docker compose up -d
Pull a model:
docker exec ollama ollama pull qwen2.5:14b-instruct-q4_K_M
Open http://localhost:8080 in your Windows browser.
Docker Compose vs bare-metal Ollama in WSL2
| Bare-metal Ollama + Docker Open WebUI | Full Docker Compose | |
|---|---|---|
| Performance | Slightly faster (no container overhead on Ollama) | ~1-2% slower for Ollama |
| Management | Two things to manage separately | docker compose up -d starts everything |
| Updates | ollama update + docker pull separately | docker compose pull && docker compose up -d |
| Model storage | ~/.ollama/models/ | Docker volume (harder to inspect) |
| GPU access | Automatic | Requires nvidia-container-toolkit |
Bare-metal Ollama + Docker Open WebUI is the simplest path for most people. Docker Compose is better if you want one command to start everything or you’re adding more services later.
The gotchas
1. Port conflict: Windows Ollama vs WSL2 Ollama
If you have Ollama installed on both Windows and WSL2, both try to bind port 11434. You’ll get "bind: address already in use" or the Windows app will silently report "another instance of ollama is running".
Fix: Pick one. If you’re running Ollama in WSL2, uninstall the Windows version (or at least quit the tray icon). Alternatively, run the WSL2 instance on a different port:
export OLLAMA_HOST=0.0.0.0:11435
ollama serve
2. File system performance
Accessing files through /mnt/c/ (your Windows drives) is 3-5x slower than WSL2’s native filesystem. This affects model loading if your models are stored on the Windows side.
Rule: Keep everything inside WSL2’s filesystem. Models go in ~/.ollama/models/ by default, which is already on the Linux filesystem. Don’t move them to /mnt/c/.
Access WSL2 files from Windows Explorer via \\wsl.localhost\Ubuntu-24.04\home\youruser\.
3. VPN kills WSL2 networking
Corporate VPNs (Cisco AnyConnect, GlobalProtect, Pulse Secure) commonly break WSL2 internet access. The VPN changes the routing table and DNS in ways that disconnect the WSL2 VM.
Fix for Windows 11: Set networkingMode=mirrored in .wslconfig. This makes WSL2 share the host network stack, including VPN connections.
Fix for Windows 10 (or if mirrored doesn’t work): Adjust the VPN interface metric so WSL2 traffic routes correctly:
# Run in PowerShell after connecting to VPN
Get-NetAdapter | Where-Object {$_.InterfaceDescription -Match "Cisco AnyConnect"} | Set-NetIPInterface -InterfaceMetric 6000
You may also need to manually set DNS in WSL2:
sudo bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
This needs to be re-run each time you connect to the VPN.
4. Disk bloat
WSL2 stores its filesystem in a VHDX file that grows when you download models but doesn’t shrink when you delete them. After cycling through several large models, you can lose tens of GB of disk space.
Prevention: Set sparseVhd=true in .wslconfig (stops future bloat).
Fix existing bloat:
# Inside WSL2, release deleted blocks
sudo fstrim /
# From PowerShell after wsl --shutdown
wsl --manage Ubuntu-24.04 --resize-vhd
5. OOM kills during model loading
If WSL2 runs out of its allocated memory, the Linux OOM killer terminates processes. Usually mid-model-load, with no useful error message. Ollama just crashes.
Fix: Increase memory and swap in .wslconfig. A 70B model at Q4 needs ~40GB of RAM for initial loading even though it runs in VRAM afterward.
6. systemd + autoMemoryReclaim=gradual = frozen shell
Using autoMemoryReclaim=gradual with systemd enabled can cause shell commands like ls and apt update to hang indefinitely. This is a known WSL2 bug.
Fix: Use autoMemoryReclaim=dropcache instead. It releases memory immediately rather than waiting for idle CPU, and doesn’t conflict with systemd.
Performance: Windows native vs WSL2
Windows Central tested Ollama across multiple models (deepseek-r1:14b, gemma3:27b, and others) on an RTX 5080 and found the tokens-per-second numbers were “as near as makes no difference, identical” between native Windows and WSL2.
| Platform | Performance | Notes |
|---|---|---|
| Native Linux | Baseline (fastest) | No virtualization overhead |
| WSL2 | 95-100% of native | WSL2 overhead is negligible for GPU-bound inference |
| Windows native | 95-100% of native | WDDM driver overhead; varies by Ollama version |
Windows native vs WSL2 Ollama: under 5% difference for GPU inference. The GPU is the bottleneck, not the OS layer.
Where you will see a gap: CPU-only inference (Linux is consistently faster), model loading from /mnt/c/ (3-5x slower than loading from WSL2’s native filesystem), and context window size, which matters far more than platform choice. One test showed 86 tok/s at 4K context vs 9 tok/s at 64K on the same model.
When to use Windows native Ollama
- You just want to chat with models and don’t need Docker, Python, or Linux tools
- You’re on Windows 10 and don’t want to deal with WSL2 networking
- You want the simplest possible setup
When to use WSL2 Ollama
- You need Docker Compose (Ollama + Open WebUI + other services)
- You develop AI apps and need Python, CUDA, and Linux tooling
- You want a setup that matches Linux deployment servers
- You’re already using WSL2 for development
Network access from other devices
If you want to access Ollama from your phone, tablet, or another computer on your network:
With mirrored networking (Windows 11 only)
If you set networkingMode=mirrored in .wslconfig, Ollama is already accessible on your LAN at your computer’s IP address. You just need:
- Ollama listening on all interfaces:
OLLAMA_HOST=0.0.0.0:11434 - A Windows Firewall rule:
New-NetFirewallRule -DisplayName "Ollama" -Direction Inbound -LocalPort 11434 -Protocol TCP -Action Allow
Then access Ollama from any device at http://<your-pc-ip>:11434.
For Open WebUI, add a rule for port 8080 too.
Without mirrored networking (Windows 10 or NAT mode)
You need manual port forwarding. WSL2’s IP changes on every reboot, so use a script:
# Save as forward-ollama.ps1, run as Administrator
$wslIp = (wsl hostname -I).Trim().Split(" ")[0]
netsh interface portproxy delete v4tov4 listenport=11434 listenaddress=0.0.0.0
netsh interface portproxy add v4tov4 listenport=11434 listenaddress=0.0.0.0 connectport=11434 connectaddress=$wslIp
netsh interface portproxy delete v4tov4 listenport=8080 listenaddress=0.0.0.0
netsh interface portproxy add v4tov4 listenport=8080 listenaddress=0.0.0.0 connectport=8080 connectaddress=$wslIp
Write-Host "Forwarding to WSL2 at $wslIp"
Add a firewall rule for both ports, and run this script after each reboot.
Quick start checklist
wsl --installfrom Admin PowerShell, restart- Enable systemd in
/etc/wsl.conf - Create
.wslconfigwith adequate memory,sparseVhd=true, andautoMemoryReclaim=dropcache wsl --shutdownand relaunch- Verify GPU:
nvidia-smiinside WSL2 - Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Set
OLLAMA_HOST=0.0.0.0:11434in the systemd service - Pull a model:
ollama pull qwen2.5:14b-instruct-q4_K_M - Run it:
ollama run qwen2.5:14b-instruct-q4_K_M - (Optional) Install Docker and Open WebUI for a browser interface
Total time from fresh Windows install to chatting with a local model: about 20 minutes, plus model download time.
Get notified when we publish new guides.
Subscribe — free, no spam