Best Local LLMs for Data Analysis
๐ Related: Best Coding Models ยท Qwen Models Guide ยท VRAM Requirements ยท Best LLMs for Math
“Data analysis” covers a lot of ground. Here’s what local LLMs are actually good at:
- Writing pandas/Python code to process CSV and JSON files
- Generating SQL queries from natural language
- Summarizing datasets โ distributions, outliers, trends
- Interpreting results โ explaining what the numbers mean
- Cleaning data โ handling missing values, type conversions, deduplication
What they’re not good at: training ML models (different task entirely), processing datasets larger than their context window (a few thousand rows at most), or replacing a statistician’s judgment on methodology.
The right model depends on whether you’re writing code or interpreting data โ and those are different skills.
Top Picks
Qwen 2.5 Coder 32B โ Best Overall
The strongest local model for data work. Qwen 2.5 Coder generates clean pandas code, handles complex SQL with joins and window functions, and can reason about what results mean โ not just produce them.
What it does well:
- Multi-step pandas transformations (groupby, merge, pivot, rolling windows)
- SQL with CTEs, subqueries, and window functions
- Explaining analysis results in plain English
- Catching data quality issues in code
VRAM: ~20GB at Q4. Needs a 24GB card (RTX 3090 or 4090).
Context: 128K tokens โ enough to include substantial schema descriptions and sample data.
ollama pull qwen2.5-coder:32b
If you have 24GB VRAM and data analysis is a primary use case, this is the model to run.
DeepSeek Coder V2 16B โ Best Mid-Size
DeepSeek Coder V2 is a MoE (Mixture of Experts) model โ 16B total parameters with fewer active per token, making it faster than a dense 16B. Particularly strong at SQL generation, where it rivals models twice its size.
What it does well:
- SQL query generation from natural language descriptions
- Database schema understanding
- Basic-to-intermediate pandas code
- Data type conversions and cleaning scripts
VRAM: ~9GB at Q4. Comfortable on 12GB cards (RTX 3060).
ollama pull deepseek-coder-v2:16b
The sweet spot for data analysis on mid-range hardware. SQL quality is its standout feature.
Qwen 2.5 14B โ Best Balance
Not a coding-specific model, but Qwen 2.5 14B is strong enough at code generation and data reasoning to handle most analysis tasks. The advantage over dedicated coding models: it’s better at interpreting and explaining results, not just producing code.
What it does well:
- Writing analysis code and explaining what it does
- Reasoning about data patterns (“revenue dropped in Q3 because…”)
- Generating both code and narrative reports
- Handling ambiguous requests (“explore this dataset”)
VRAM: ~9GB at Q4. Runs on 12GB cards.
ollama pull qwen2.5:14b
Choose this over DeepSeek Coder when you need the model to think about your data, not just write code for it.
Phi-4 14B โ Best for Math-Heavy Analysis
Phi-4 excels at the quantitative reasoning side of data analysis. Statistical calculations, formula derivation, mathematical pattern recognition โ anywhere the task is more math than code, Phi-4 outperforms models at its size.
What it does well:
- Statistical calculations and interpretations
- Mathematical pattern recognition
- Structured data processing
- Formula-based data transformations
What it doesn’t: Creative interpretation, long-context analysis (16K limit), multilingual data.
VRAM: ~10GB at Q4.
ollama pull phi4
A specialist pick. If your analysis is heavy on statistics and light on complex code, Phi-4’s math foundation gives it an edge.
By Task: Which Model for What
SQL Generation
Best: Qwen 2.5 Coder 32B > DeepSeek Coder V2 16B > Qwen 2.5 Coder 7B
SQL is where coding models shine. They handle joins, aggregations, window functions, CTEs, and subqueries reliably โ as long as you give them the schema.
Critical: Always include your table schema in the prompt. Without it, models guess column names and get them wrong.
Given this schema:
CREATE TABLE sales (
id INT PRIMARY KEY,
product_name VARCHAR(100),
category VARCHAR(50),
amount DECIMAL(10,2),
sale_date DATE,
region VARCHAR(50)
);
Write a SQL query to find the top 5 product categories by total revenue
for each region in Q4 2025, including month-over-month growth rates.
This produces accurate SQL. “Write a query to find top products” without schema produces garbage.
pandas / Python Data Wrangling
Best: Qwen 2.5 Coder 32B > Qwen 2.5 Coder 7B > DeepSeek Coder V2 16B
Include sample data in your prompt โ even just 3-5 rows. Models need to see column names, data types, and value formats to write correct code.
I have a CSV with these columns and sample data:
date,product,units_sold,price,region
2025-01-15,Widget A,150,29.99,North
2025-01-15,Widget B,89,49.99,South
2025-01-16,Widget A,203,29.99,North
2025-02-01,Widget C,45,99.99,East
Write pandas code to:
1. Load this CSV
2. Add a revenue column (units_sold * price)
3. Create a monthly summary with total revenue per product
4. Find which product had the highest month-over-month growth
Data Interpretation and Reporting
Best: Qwen 2.5 32B > Qwen 2.5 14B > Phi-4 14B
General models outperform coding models here. When the task is “what does this data mean?” rather than “write code to process this data,” you want a model with strong reasoning and natural language ability.
Here are the monthly revenue figures for our SaaS product:
Jan: $45,200
Feb: $43,800
Mar: $51,300
Apr: $49,100
May: $62,400
Jun: $58,900
Jul: $71,200
Aug: $68,500
What patterns do you see? What might explain the Q2 acceleration?
Are there any concerning trends?
Coding models will offer to write analysis code. General models will actually analyze.
Data Cleaning
Best: Qwen 2.5 Coder (any size) > DeepSeek Coder V2
Handling missing values, standardizing formats, deduplication, type conversions โ these are bread-and-butter coding tasks. Even the 7B coding models handle them well.
This CSV has messy data:
- Dates in mixed formats (MM/DD/YYYY, YYYY-MM-DD, "January 5, 2025")
- Phone numbers with inconsistent formatting
- Duplicate rows where name differs by whitespace
- Missing values represented as "", "N/A", "null", and actual NaN
Write pandas code to clean all of these issues.
The Practical Workflow
Local LLMs don’t replace Python for data analysis โ they accelerate it. Here’s the workflow that actually works:
Step 1: Describe Your Data
Tell the model what you have. Include:
- Column names and types
- 3-5 sample rows
- The size of the dataset
- Any known issues (missing values, mixed types)
Step 2: Ask for Analysis Code
Be specific about what you want. “Analyze this data” produces generic code. “Calculate the rolling 7-day average of daily sales, grouped by region, and flag days where sales dropped more than 20% from the previous week” produces useful code.
Step 3: Run the Code
Copy the generated code into a Python script or Jupyter notebook. Run it on your actual data.
Step 4: Paste Results Back
Take the output โ summary statistics, error messages, charts data โ and paste it back to the model:
I ran your code and got this output:
Region Avg Revenue Flagged Days
North $12,340 3
South $8,920 7
East $15,100 1
West $11,200 5
The South region has the most flagged days. What might explain this?
What additional analysis would you recommend?
Step 5: Iterate
The model suggests next steps. You run more code. You paste more results. Each round refines the analysis.
This loop is where local LLMs outperform notebooks-only workflows. Instead of figuring out the right pandas function from documentation, you describe what you want and get working code in seconds.
Prompting Strategies
Include Schema, Always
The single biggest improvement you can make:
| Prompt Quality | What Happens |
|---|---|
| “Write SQL to find top customers” | Model guesses table/column names. 50% chance of wrong schema. |
“Using the orders table with columns customer_id, amount, order_date, find the top 10 customers by total spend in 2025” | Correct SQL on first try. |
Show Sample Data
For pandas work, paste 3-5 representative rows. The model needs to see:
- Actual column names (not guessed)
- Data types (are prices strings or floats?)
- Value formats (dates as ISO or American? IDs as integers or UUIDs?)
- Edge cases (any nulls, special characters, mixed types?)
Specify Output Format
# Vague
"Analyze sales trends"
# Specific
"Write pandas code that outputs a DataFrame with columns:
month, total_revenue, pct_change_from_previous, is_above_average
Print the result as a formatted table."
Break Complex Analysis Into Steps
Don’t ask for everything at once. A 7B model that handles three-step analysis reliably will fumble a ten-step request. Break it up:
- “Load and clean the data”
- “Calculate summary statistics by category”
- “Identify outliers using IQR method”
- “Generate the final report”
Each step builds on the previous output. Simpler prompts produce more reliable code.
Limitations
Context Window vs Dataset Size
A model with 128K context can process roughly 90K words of input โ that’s a lot of text but maybe 5,000-10,000 CSV rows. For larger datasets, you can’t paste the whole thing. Instead:
- Provide schema + sample rows
- Ask for code that processes the full file from disk
- Paste summary statistics, not raw data
Hallucinated Column Names
If you don’t provide schema, the model will invent column names that seem reasonable but don’t exist in your data. The generated code runs, throws a KeyError, and you waste time debugging something the model made up. Always provide column names.
Code Quality
Model-generated code usually works but isn’t always optimal. Common issues:
- Inefficient loops where vectorized operations would be faster
- Missing error handling for edge cases
- Hardcoded values that should be parameters
- Deprecated pandas syntax (
.append()instead ofpd.concat())
Review and test all generated code. Use it as a starting point, not a final product.
Statistical Rigor
LLMs can calculate means and standard deviations. They should not design your statistical methodology. If your analysis requires hypothesis testing, regression assumptions, or causal inference, consult a statistics reference โ don’t trust the model to choose the right test.
Tools That Help
Open Interpreter
Open Interpreter lets the LLM write and execute Python code directly. Ask “analyze this CSV” and it writes the code, runs it, sees the output, and iterates โ all automatically.
pip install open-interpreter
interpreter --model ollama/qwen2.5-coder:32b
Powerful but risky โ it executes arbitrary code on your machine. Review what it’s doing.
Jupyter + Ollama
Use Ollama’s API directly in Jupyter notebooks for an interactive analysis loop:
import requests
def ask_llm(prompt, model="qwen2.5-coder:32b"):
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": model, "prompt": prompt, "stream": False}
)
return response.json()["response"]
# In a notebook cell:
schema = """Columns: date, product, units, price, region
Sample: 2025-01-15, Widget A, 150, 29.99, North"""
code = ask_llm(f"Given this data:\n{schema}\n\nWrite pandas code to calculate monthly revenue by product.")
print(code)
# Copy to next cell, run, inspect output, iterate
VRAM Recommendations
| Your VRAM | Best Model | What You Get |
|---|---|---|
| 8GB | Qwen 2.5 Coder 7B | Basic pandas, simple SQL, data cleaning |
| 12GB | DeepSeek Coder V2 16B | Strong SQL, intermediate pandas, decent reasoning |
| 12GB (alternative) | Qwen 2.5 14B | Better interpretation, solid code generation |
| 16GB | Qwen 2.5 Coder 14B | Good balance of code + reasoning |
| 24GB | Qwen 2.5 Coder 32B | Best overall โ complex analysis, multi-step workflows |
If you split time between data analysis and other tasks (chat, writing, general coding), Qwen 2.5 14B or 32B (non-Coder) gives you a versatile single model. If data work is your primary use case, the Coder variants are worth the specialization.
The Bottom Line
Local LLMs won’t replace pandas, SQL, or your analytical judgment. They accelerate the loop: describe what you want โ get working code โ run it โ interpret results โ iterate. A task that takes 30 minutes of Stack Overflow browsing and documentation reading takes 5 minutes with a good local model.
Start with the workflow: schema in the prompt, sample data included, specific questions, iterative refinement. The model choice matters less than the prompting strategy โ even a 7B model produces useful analysis code when you tell it exactly what your data looks like.
๐ Model guides: Best Coding Models ยท Qwen Models Guide ยท Phi Models Guide ยท Best LLMs for Math
๐ Hardware: VRAM Requirements ยท 12GB VRAM Guide ยท 24GB VRAM Guide