GGUF File Won't Load: Format and Compatibility Fixes
๐ More on this topic: Model Formats Explained: GGUF, GPTQ, AWQ, EXL2 ยท Quantization Explained ยท Run Your First Local LLM
You downloaded a GGUF file. It should just load. It doesn’t. Here’s every reason why and how to fix each one.
Quick Diagnostic
| Error Message | Jump To |
|---|---|
invalid magic number or not a GGUF file | Wrong File Format |
unsupported GGUF version | Version Mismatch |
failed to load model with no details | Corrupted Download |
unexpected EOF or truncated data | Corrupted Download or Split Files |
imatrix in the error | imatrix Quants |
| Loads partway, then crashes or OOM | Too Big for Memory |
Ollama says invalid model on import | Ollama Import |
1. GGUF Version Mismatch
Error:
error: unsupported GGUF version: 3
Cause: GGUF is a versioned format. Newer models use GGUF v3, but older builds of llama.cpp or Ollama only understand v2 or earlier. Quantization tools on HuggingFace always use the latest version.
Fix: Update your tools to latest:
# llama.cpp
cd llama.cpp && git pull
cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j $(nproc)
# Ollama
curl -fsSL https://ollama.com/install.sh | sh
This is the most common GGUF loading failure. If the file loaded fine a month ago but a newly downloaded model won’t โ your tools are behind.
2. Corrupted Download
Error: failed to load model, segfault on load, or gibberish output after loading.
Cause: GGUF files are 2-50GB. Large downloads fail silently โ your browser shows “complete” but the file is truncated. Network interruptions, disk space running out mid-download, and CDN issues all cause this.
Fix:
- Check the file size. Compare your local file against what HuggingFace lists:
ls -lh model-Q4_K_M.gguf
# Compare with the size shown on the HuggingFace repo page
If your file is smaller than listed, it’s truncated. Re-download.
- Verify the sha256 hash if the repo provides one:
sha256sum model-Q4_K_M.gguf
# Compare with the hash in the repo's README or .sha256 file
- Use a download manager for large files.
wget -csupports resumable downloads:
wget -c https://huggingface.co/user/repo/resolve/main/model-Q4_K_M.gguf
In Ollama, a corrupted pull is rare but possible. Fix: ollama rm model:tag then ollama pull model:tag again.
3. Split Files โ Missing Parts
Error: unexpected EOF, file seems too small, or model loads with missing layers.
Cause: Large GGUF files are sometimes split into multiple parts for hosting. You downloaded one part but not all of them.
Split files look like:
model-Q4_K_M.gguf-split-a-of-b
model-Q4_K_M.gguf-split-b-of-b
Or sometimes:
model-Q4_K_M-00001-of-00003.gguf
model-Q4_K_M-00002-of-00003.gguf
model-Q4_K_M-00003-of-00003.gguf
Fix: Download every split file into the same directory. llama.cpp reads them automatically when you point it at the first file:
llama-cli -m model-Q4_K_M-00001-of-00003.gguf -p "Hello"
Some repos also have a non-split version. Check if there’s a single file option before downloading 3+ parts.
4. Wrong File Format
Error:
error: invalid magic number: 0x46545347 (expected 0x46475547)
Or: not a GGUF file.
Cause: You downloaded a file that isn’t GGUF. Common mistakes:
| What You Downloaded | How to Tell |
|---|---|
| GPTQ model | File ends in .safetensors, repo says “GPTQ” |
| AWQ model | Repo says “AWQ” in the name |
| Raw safetensors | Multiple .safetensors files, no .gguf |
| EXL2 model | Repo says “EXL2”, needs ExLlamaV2 |
| Older GGML format | File ends in .bin (pre-GGUF legacy format) |
Fix: Find the GGUF version. Look for repos by these quantizers:
- bartowski โ high-quality GGUF quants for most popular models
- mradermacher โ wide coverage, imatrix quants
- TheBloke โ older models (mostly Llama 2 era, still valid)
- Mungert โ Unsloth GGUF quants
Search HuggingFace for [model name] GGUF and filter by the quantization level you want (Q4_K_M is the sweet spot for most users).
5. Model Too Big for Memory
Error: Loading starts, progress bar moves, then crash โ OOM, segfault, or the process gets killed.
Cause: The model needs more RAM + VRAM than you have. A 70B Q4_K_M file is ~40GB โ you need that much memory available just to load it, plus overhead for the KV cache during inference.
Fix:
Use a smaller quantization. Q4_K_M โ Q3_K_M saves ~25% memory. Q2_K saves more but quality drops noticeably. See the quantization guide for the full tradeoff table.
Use a smaller model. If 70B won’t fit, try 32B. If 32B won’t fit, try 14B. The quality gap between sizes is real, but a model that loads beats one that doesn’t.
Partial GPU offloading. Load some layers on GPU, the rest on CPU:
# llama.cpp โ offload 20 layers to GPU, rest stays in RAM
llama-cli -m model.gguf -ngl 20
# Ollama โ set in Modelfile
PARAMETER num_gpu 20
Check our VRAM requirements guide for exact memory needs by model and quant level.
6. Ollama Can’t Import Custom GGUF
Error: invalid model format or Ollama ignores your GGUF file.
Cause: Ollama doesn’t load raw GGUF files directly. You need a Modelfile that tells Ollama where the file is and how to format prompts.
Fix: Create a Modelfile:
FROM ./model-Q4_K_M.gguf
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER stop "<|im_end|>"
Then import:
ollama create mymodel -f Modelfile
ollama run mymodel
The TEMPLATE must match the model’s chat format. The example above is ChatML (works for Qwen, many others). Llama 3 models need a different template. If the model loads but gives incoherent output, the template is wrong.
7. imatrix Quants
Error: Mentions imatrix, importance matrix, or fails on a quant level that should be standard (like Q4_K_M).
Cause: imatrix (importance matrix) quantization is a newer method that produces better quality at low bit rates. It requires a recent version of llama.cpp to load. Older builds don’t recognize the format.
Fix: Same as section 1 โ update llama.cpp or Ollama to the latest version. imatrix quants have been supported since mid-2025, so anything reasonably current handles them.
imatrix quants are identified by imatrix or im in the filename (e.g., model-Q4_K_M-imatrix.gguf). They load and run identically to standard quants once your tools are current.
Bottom Line
GGUF loading failures come down to three things: your tools are outdated, your file is damaged, or you grabbed the wrong format. Update first, check file size second, verify you actually have a GGUF third. That catches 90% of issues.
For the remaining 10%: the model is too big (use a smaller quant), you’re missing split file parts, or your Ollama Modelfile needs the right chat template.