AnythingLLM Setup Guide: Chat With Your Documents Locally
📚 Related: Local RAG Guide · Open WebUI Setup · Ollama Setup · Best LLMs for RAG
You have a stack of PDFs — research papers, contracts, technical docs, meeting notes. You want to ask questions and get answers that reference the actual content. Not hallucinated summaries. Actual quotes from actual pages.
That’s RAG (Retrieval Augmented Generation), and normally it requires Python, an embedding pipeline, a vector database, and enough LangChain boilerplate to make you question your career choices.
AnythingLLM skips all of that. It’s a desktop app — 54K+ GitHub stars, MIT license — that connects to your local Ollama instance and gives you point-and-click document chat. Upload files, ask questions, get answers grounded in your documents. No coding required.
Here’s how to set it up and get useful results.
What You Need
Hardware
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB |
| VRAM | None (CPU mode) | 8 GB+ |
| Disk | 10 GB free | 20 GB+ (models + embeddings) |
| CPU | Any x86_64 or ARM64 | 4+ cores |
AnythingLLM itself is lightweight. The hardware requirements come from running the LLM and embedding model alongside it. With 8GB VRAM, you can run a 7B chat model and an embedding model simultaneously. On CPU only, expect 5-10 second response times for a 7B model.
Software
- Ollama (or LM Studio) — your local inference engine
- A chat model — Qwen 2.5 7B or Llama 3.1 8B recommended
- An embedding model — nomic-embed-text (pulled automatically or manually)
Step 1: Install Ollama and Pull Models
If you already have Ollama running, skip to Step 2.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a chat model (pick one)
ollama pull qwen2.5:7b # Best general-purpose at this size
ollama pull llama3.1:8b # Alternative
# Pull an embedding model
ollama pull nomic-embed-text # ~275 MB, fast, good quality
Verify Ollama is running:
curl http://localhost:11434/api/tags
You should see your pulled models listed. If you get a connection error, run ollama serve in a separate terminal.
For detailed Ollama setup, see the beginner’s guide.
Step 2: Install AnythingLLM
Desktop App (Recommended)
Download from anythingllm.com — available for Windows, macOS, and Linux.
- Windows: Run the .exe installer
- macOS: Drag to Applications (requires macOS 13.6+ for Apple Silicon)
- Linux: AppImage — make it executable and run
The desktop app includes a built-in embedding model (all-MiniLM-L6-v2), so you can start without pulling nomic-embed-text. But the Ollama embedding model is better — configure it in the next step.
Docker (Alternative)
docker pull mintplexlabs/anythingllm
docker run -d -p 3001:3001 \
--add-host=host.docker.internal:host-gateway \
-v anythingllm_data:/app/server/storage \
--name anythingllm \
mintplexlabs/anythingllm
Then open http://localhost:3001 in your browser.
The --add-host flag lets the container reach Ollama on your host machine. Without it, AnythingLLM can’t connect to localhost:11434.
Step 3: Connect to Ollama
On first launch, AnythingLLM walks you through setup:
- LLM Provider: Select “Ollama” → set URL to
http://localhost:11434(orhttp://host.docker.internal:11434if running Docker) - Chat Model: Pick your model from the dropdown (qwen2.5:7b or llama3.1:8b)
- Embedding Provider: Select “Ollama” → pick
nomic-embed-text - Vector Database: Leave as default (LanceDB — built-in, no setup)
That’s the core configuration. Everything else can stay at defaults.
Step 4: Create a Workspace and Upload Documents
Create a Workspace
Click “New Workspace” and give it a name — something descriptive like “Research Papers” or “Project Docs.” Each workspace has its own document collection and chat history.
Upload Documents
Click the upload icon in your workspace and drag files in. Supported formats:
| Format | Quality | Notes |
|---|---|---|
| Good | Text-based PDFs work great. Scanned PDFs need OCR first | |
| TXT / Markdown | Excellent | Cleanest results |
| DOCX | Good | Formatting stripped, text preserved |
| Web URLs | Good | Paste a URL, AnythingLLM scrapes the content |
| Code files | Good | .py, .js, .ts, .md, etc. |
| CSV | Fair | Parsed as text rows |
After uploading, AnythingLLM automatically:
- Splits your document into chunks (default: 1,000 tokens)
- Generates embeddings for each chunk using your embedding model
- Stores the embeddings in the vector database
Large PDFs (100+ pages) take a minute or two to process. You’ll see a progress indicator.
What Happens Under the Hood
When you ask a question, AnythingLLM:
- Embeds your question using the same embedding model
- Searches the vector database for the most relevant chunks
- Sends those chunks + your question to the chat model
- The model answers based on the retrieved context
This is why answers reference your actual documents instead of making things up — the model sees the relevant text before responding.
Step 5: Start Chatting
Type a question in the chat box. Good first questions:
- “Summarize this document”
- “What are the key findings?”
- “What does section 3 say about [topic]?”
- “List all dates and deadlines mentioned”
AnythingLLM shows which document chunks it retrieved for each answer. Check these — if the retrieved chunks are irrelevant, the answer won’t be good regardless of how capable your model is.
Settings That Matter
Most defaults are fine. These are the ones worth adjusting.
Chunk Size
Default: 1,000 tokens. When to change: If your documents have very short sections (emails, notes), reduce to 500. If they have long continuous arguments (legal documents, research papers), increase to 1,500-2,000.
Smaller chunks = more precise retrieval but risk missing context. Larger chunks = more context per retrieval but risk including irrelevant text.
Top K (Number of Retrieved Chunks)
Default: 4. When to change: Increase to 6-8 if answers seem incomplete or miss relevant information. Decrease to 2-3 if answers include too much off-topic content.
More chunks means more context for the model, but also more noise and slower responses (larger prompt).
Chat Mode
AnythingLLM has two modes:
- Chat: Normal conversation with RAG. The model uses retrieved documents plus conversation history.
- Query: Pure question-answering. No conversation history, just document retrieval + answer. Better for factual lookups, worse for follow-up questions.
Use Chat mode by default. Switch to Query mode when you need precise, standalone answers without conversational drift.
Embedding Model
If you started with the built-in embedder (all-MiniLM-L6-v2), consider switching to nomic-embed-text via Ollama. It produces better retrieval quality, especially for technical documents.
Important: Changing your embedding model requires re-embedding all documents. AnythingLLM will prompt you to do this. It’s not a quick operation on large collections.
Tips for Better Results
Use Descriptive Questions
Bad: “Tell me about the budget” Good: “What is the total projected budget for Q3 2026 according to the financial report?”
Specific questions produce better embedding matches, which produce better retrieval, which produce better answers.
One Topic Per Workspace
Don’t dump 50 unrelated documents into one workspace. Create separate workspaces for separate topics — “Tax Documents,” “Research Papers,” “Project Specs.” This keeps the vector database focused and retrieval accurate.
Check Your Sources
AnythingLLM shows which chunks were retrieved for each answer. If the citations look wrong, the answer is probably wrong too — even if it sounds confident. Always verify claims against the source chunks.
Pre-Process Messy PDFs
Scanned PDFs, PDFs with complex tables, and PDFs with multi-column layouts produce poor chunks. If you have important documents in these formats, convert them to clean text or markdown first. Tools like pdftotext or Marker (AI-based PDF extraction) produce much cleaner input.
Model Choice Matters
For RAG specifically, instruction-following ability matters more than raw intelligence. A 7B model that reliably quotes source text beats a 14B model that paraphrases and adds its own interpretation. Qwen 2.5 7B and Llama 3.1 8B are both good choices. See the best models for RAG guide for detailed comparisons.
Limitations
Table Extraction
PDF tables are AnythingLLM’s weakest point. Complex tables with merged cells, multi-line entries, or nested headers often get garbled during text extraction. If table data matters, consider extracting tables separately and uploading them as CSV or markdown.
Very Large Document Sets
With hundreds of documents, embedding and retrieval slow down. The built-in LanceDB handles thousands of documents fine, but processing time during upload scales linearly. Budget 1-2 minutes per 100 pages for initial embedding.
Multi-Document Reasoning
“Compare the conclusions of Paper A and Paper B” requires the retriever to pull relevant chunks from both documents simultaneously. This works sometimes but isn’t reliable — the retriever might grab chunks from only one document. For cross-document analysis, summarize each document separately first, then ask comparison questions in a fresh chat.
Embedding Model Lock-In
Embedding models are set system-wide in AnythingLLM, not per workspace. If you switch embedding models, every workspace needs re-embedding. Pick your embedder once and stick with it.
Fixed Ollama Timeout
AnythingLLM has a built-in 5-minute timeout for Ollama responses. If your model is slow (CPU inference, very large models), long responses may get cut off. Not an issue with 7-8B models on GPU, but can bite you with larger models on limited hardware.
AnythingLLM vs Alternatives
| Feature | AnythingLLM | Open WebUI | PrivateGPT |
|---|---|---|---|
| Primary strength | RAG / documents | Chat interface | Customizable RAG |
| Setup difficulty | Easy (desktop app) | Easy (Docker) | Moderate (Python) |
| RAG quality | Good (built-in) | Basic (add-on) | Good (LlamaIndex) |
| Agent support | Yes (built-in skills) | Yes (tools/functions) | Limited |
| Multi-model switching | Yes | Yes (better) | Yes |
| UI quality | Good | Best | Basic |
| GitHub stars | 54K+ | 120K+ | 20K+ |
| Best for | Document Q&A | General chat | Dev-friendly RAG |
Pick AnythingLLM if your primary goal is chatting with documents. It has the best out-of-the-box RAG experience with zero coding.
Pick Open WebUI if you want a general-purpose chat interface that also does RAG. Better for daily chat use, model switching, and community features.
Pick PrivateGPT if you’re a developer who wants full control over the RAG pipeline and is comfortable with Python.
Best Use Cases
Searching personal notes: Upload your markdown notes, journal entries, or meeting minutes. Ask “When did I last discuss the API redesign?” and get actual dates and context.
Technical documentation: Upload API docs, README files, or internal wikis. “How do I authenticate with the /users endpoint?” returns the actual documented process.
Research paper review: Upload 5-10 papers on a topic. “What methods did these studies use for data collection?” pulls relevant methodology sections from across your collection.
Contract review: Upload a contract and ask “What are the termination clauses?” or “What happens if delivery is late?” Note: for legal decisions, always verify against the source document — don’t rely solely on AI interpretation.
Code documentation: Upload your codebase as text files. “Where is the database connection configured?” finds the relevant files and shows the configuration.
The Bottom Line
AnythingLLM is the path of least resistance for local document chat. No Python, no Docker (unless you want it), no configuration files. Install the desktop app, connect to Ollama, upload your files, start asking questions.
The RAG isn’t perfect — table extraction is weak, multi-document reasoning is inconsistent, and the embedding model choice is global rather than per-workspace. But for the five minutes it takes to set up versus the hours you’d spend building the same thing with LangChain, it’s hard to argue against starting here.
If you outgrow it, you’ll know exactly what you need from a custom solution — because AnythingLLM will have taught you what RAG can and can’t do with your specific documents.
📚 Go deeper: Local RAG Guide · Best LLMs for RAG · Ollama Setup Guide · Open WebUI Setup