Local AI Upscaling: Make Blurry Images Sharp Without the Cloud

More on this topic: ComfyUI vs A1111 vs Fooocus · Best Used GPUs for Local AI · VRAM Requirements

You’ve got a shoebox of old family photos scanned at 640x480. Or game screenshots you want as wallpaper. Or 200 product images that need to be twice as big for a website redesign. Cloud upscaling services charge $5-10/month and send every image to someone else’s server.

Local upscaling runs on your machine, costs nothing after setup, and finishes faster than uploading. The models are tiny compared to LLMs. Real-ESRGAN, the most popular upscaling model, is 67MB. A GTX 1060 from 2016 handles it fine.

Here’s how to pick the right tool, do your first upscale in two minutes, and batch-process a whole folder overnight.

Why local instead of cloud

Family photos, medical images, client work, anything you wouldn’t email to a stranger. Cloud services process your images on their servers. Some keep copies. Local upscaling never sends data anywhere.

Topaz Gigapixel went subscription-only in September 2025, $29/month. Cloud APIs charge per image. Local tools are free and open source.

No upload, no waiting for a server queue, no download. Real-ESRGAN processes a photo in 2-6 seconds on a mid-range GPU.

Need to upscale 500 photos? Cloud services throttle you or charge extra. Locally, you point the tool at a folder and walk away.

The models: what actually does the upscaling

The software is just a wrapper. The model does the work. Here are the ones worth knowing.

Model	Scale	Best for	File size
RealESRGAN_x4plus	4x	General photos, the default choice	67 MB
RealESRGAN_x2plus	2x	When 4x is overkill (less artifact risk)	67 MB
4x-UltraSharp	4x	Sharp edges, text, UI screenshots, digital art	67 MB
RealESRGAN_x4plus_anime_6B	4x	Anime, illustration, cel-shaded art	17 MB
4x-Foolhardy-Remacri	4x	Texture reconstruction, game assets	67 MB
realesr-animevideov3	4x	Anime video, frame-by-frame	8 MB

RealESRGAN_x4plus handles 80% of what people throw at it. Start there. Switch to 4x-UltraSharp if you need cleaner edges on text or screenshots. Use the anime model for anything with flat colors and hard lines.

All of these are free. You can grab them from OpenModelDB or they come bundled with most upscaling apps.

2x vs 4x: when bigger isn’t better

A 4x upscale takes a 500x500 image to 2000x2000. That’s 16x more pixels the model has to invent. On a clean, high-res source image, 4x can introduce artifacts: fake skin pores, hallucinated fabric textures, grass patterns that weren’t there.

If your source image is already decent (phone photos from the last 5 years, for example), try 2x first. The output is cleaner, the file size is smaller, and the processing is faster. Save 4x for genuinely low-res sources, old scans, webcam captures, small thumbnails you need to blow up.

The software: pick one and go

Upscayl (easiest, recommended for most people)

Upscayl is a free, open-source desktop app. Download it, install it, drag in a photo, pick a model, click upscale. The current version (v2.15.0, December 2024) added a High Fidelity model, clipboard paste support, and a lens viewer for comparing before/after.

It uses Real-ESRGAN’s NCNN backend with Vulkan, so it works on NVIDIA, AMD, and Intel GPUs. No CUDA required. 43,000+ stars on GitHub.

The two-minute walkthrough:

Download Upscayl from upscayl.org (Windows, macOS, Linux)
Install and open it
Drag an image into the window (or click “Select Image”)
Pick “General Photo (Real-ESRGAN)” from the model dropdown
Choose 4x scale
Click “Upscale”
Done. Output saves next to your original by default

For batch processing, switch to the “Batch” tab, point it at a folder, and let it run. On a mid-range GPU, expect 10-20 seconds per image.

Limitation: Requires a Vulkan-compatible GPU. No CPU fallback. If you have a very old laptop with integrated graphics, Upscayl won’t work. Use the Real-ESRGAN CLI instead.

Real-ESRGAN CLI (batch processing, scriptable)

If you need to upscale hundreds of images unattended, or you want to script it into a workflow, the command line is the way.

For NVIDIA GPUs (PyTorch/CUDA):

pip install realesrgan

# Single image
python -m realesrgan -i photo.jpg -o photo_upscaled.png -n RealESRGAN_x4plus -s 4

# Entire folder
python -m realesrgan -i /path/to/photos/ -o /path/to/output/ -n RealESRGAN_x4plus

For AMD/Intel GPUs (Vulkan, no Python needed):

Download the pre-built binary from Real-ESRGAN-ncnn-vulkan. Unzip and run:

./realesrgan-ncnn-vulkan -i photo.jpg -o photo_upscaled.png -n realesrgan-x4plus

Key flags:

--tile 256 — use tiling to reduce VRAM usage (lets you run on 2GB GPUs)
--face_enhance — apply GFPGAN face enhancement
-s 2 — output at 2x instead of 4x

Batch 500 photos overnight: Point the CLI at your folder, pipe the output somewhere, and let it run. At 3-6 seconds per image on an RTX 3060, 500 images take about 30-50 minutes.

chaiNNer (most flexible, power users)

chaiNNer (v0.25.1, October 2024) is a node-based image processing app. Think of it like a visual pipeline builder: load image, denoise, upscale with model X, adjust colors, sharpen, save. You connect nodes with wires and the data flows through.

It supports PyTorch, NCNN, ONNX, and TensorRT backends, and can load any model from OpenModelDB. That’s over 160 specialized upscaling models for photos, anime, game textures, manga, old film.

chaiNNer is overkill if you just want to drag and drop a photo. It’s the right tool when you want a repeatable processing pipeline: denoise with one model, upscale with another, color-correct, then batch the whole thing.

ComfyUI (if you’re already generating images)

If you’re using ComfyUI for Stable Diffusion or Flux, you can upscale inside your generation workflow. Four methods, each with different quality/speed/VRAM trade-offs:

ESRGAN model node — Load Upscale Model → Upscale Image (Using Model) → Save Image. Fast (5-6 seconds), 2-6 GB VRAM. Functionally identical to standalone Real-ESRGAN.
Latent upscale + KSampler — Generate at base resolution, pass the latent through a Latent Upscale node (2x), then a second KSampler at low denoise (0.3-0.5). Everything stays in latent space. 4-10 GB VRAM depending on model.
Ultimate SD Upscale — Processes your image tile-by-tile through the diffusion model. First upscales with ESRGAN, then re-renders each tile through img2img. Higher quality than pure ESRGAN. 8-12 GB VRAM.
ControlNet Tile + Ultimate SD Upscale — The highest quality ComfyUI method short of SUPIR. ControlNet Tile feeds color and structure as conditioning, keeping the diffusion model close to the original. 10-14 GB VRAM.

For Flux users, the Flux.1-dev-Controlnet-Upscaler from Jasper AI is a dedicated ControlNet trained for upscaling with Flux. Set strength to 0.6, GGUF Q4_K_M variant works on 8-12 GB VRAM.

SUPIR (maximum quality, damaged photos)

SUPIR (Scaling Up to Excellence) uses SDXL, a 2.6-billion-parameter diffusion model, as a generative prior. It also integrates LLaVA (a vision-language model) to auto-caption your image and guide restoration. It understands what’s in your image and generates appropriate detail — skin texture looks different from metal, fabric weave differs from concrete.

On a badly degraded photo, the difference between SUPIR and Real-ESRGAN is immediately obvious. SUPIR reconstructs plausible facial features from 20x20 pixel faces, works around heavy JPEG compression damage, and generates individually distinguishable foliage where Real-ESRGAN produces uniform textures.

The trade-offs are real: SUPIR needs 12GB+ VRAM (8GB minimum in fp8 mode), takes 30-60 seconds per image, is NVIDIA-only, and has a non-commercial license. It hallucinates without a good prompt — always provide a descriptive prompt and negative prompt. It’s terrible at text (34.6% OCR accuracy, worse than bicubic interpolation) and adds unwanted grain to anime/illustration. Use it for the 10 photos that matter most, not for batch processing 500 vacation shots.

Run it through kijai/ComfyUI-SUPIR (2,200 stars, actively maintained) or the MonsterMMORPG standalone enhanced fork with one-click installers.

SUPIR Config	VRAM Needed
fp8 UNet, no LLaVA	~8 GB
fp16, no LLaVA	~12 GB
fp16 + LLaVA (auto-captioning)	~30 GB

VRAM requirements at a glance

Method	Min VRAM	Speed (per image)	GPU Support
Real-ESRGAN (tiled)	2 GB	2-10 sec	NVIDIA, AMD, Intel
Upscayl	2 GB	10-20 sec	Any Vulkan GPU
ComfyUI ESRGAN node	2-4 GB	5-6 sec	NVIDIA, AMD (ROCm)
ComfyUI Ultimate SD Upscale (SDXL)	8 GB	3-8 min	NVIDIA, AMD (ROCm)
ComfyUI ControlNet Tile (SDXL)	10 GB	5-10 min	NVIDIA, AMD (ROCm)
SUPIR (fp8, no LLaVA)	8 GB	30-60 sec	NVIDIA only
SUPIR (fp16, no LLaVA)	12 GB	30-60 sec	NVIDIA only

Quality comparison by content type

Content Type	Best Method	Avoid
Clean photos	ControlNet Tile or Real-ESRGAN x4plus	—
Damaged/old photos	SUPIR (v0Q)	Real-ESRGAN (amplifies noise)
Portraits and faces	SUPIR + CodeFormer	—
Anime and illustration	Real-ESRGAN anime_6B	SUPIR (adds grain to flat colors)
Text and UI screenshots	4x-UltraSharp (ESRGAN)	SUPIR (generates wrong characters)
Batch processing (500+ images)	Real-ESRGAN / Upscayl	SUPIR or diffusion methods

The free Topaz Gigapixel alternative question

Topaz went subscription-only in September 2025 — $29/month. The perpetual licenses are gone.

What Topaz does that free tools don’t: face recovery from pixelated blobs, combined denoise + sharpen + upscale in one pass, Lightroom/Photoshop plugin integration. What free tools do just as well: clean 2-4x upscaling (Real-ESRGAN), anime/illustration upscaling (Real-ESRGAN anime model), maximum quality photo restoration (SUPIR matches or exceeds Topaz), and full pipeline automation (ComfyUI, chaiNNer).

For clean photos where you just need more pixels, Upscayl gets you 90% of Topaz quality. SUPIR can match or beat Topaz on heavily degraded photos but demands more GPU and setup time. Topaz earns its subscription for professional photographers doing volume work with the Lightroom plugin.

Video upscaling: it works, but set expectations

Upscaling video is frame-by-frame image upscaling with frame interpolation on top. A 10-minute 1080p video at 30fps has 18,000 frames. At 3 seconds per frame, that’s 15 hours of processing.

It works. It’s just slow.

Video2X

Video2X (v6.0.0) is the most popular free option. Version 6 is a complete rewrite in C/C++, which means it’s faster and less painful to install than the old Python version. It supports Real-ESRGAN, Anime4K, Real-CUGAN, and RIFE (for frame interpolation), and requires zero extra disk space during processing.

Good for: anime upscaling (480p to 1080p), old home videos, game footage.

Limitation: Windows and Linux only. No macOS support.

Flowframes

Flowframes handles frame interpolation (turning 30fps into 60fps) more than upscaling, but it can do both. Windows only. Good for making old video footage smoother.

Realistic expectations for video

Source	Target	10 min video	RTX 3060 time
480p	1080p (2x)	18,000 frames	~6-8 hours
720p	1440p (2x)	18,000 frames	~8-12 hours
1080p	4K (2x)	18,000 frames	~15-20 hours

These are rough estimates. Faster GPUs help, but video upscaling is a patience game. Queue it up before bed.

Anime upscales better than live-action because the flat colors and hard edges are easier for the models to reconstruct. Live-action footage with lots of motion, grain, and fine detail takes longer and the results are less consistent.

Hardware: you probably already have enough

Upscaling models are tiny. Real-ESRGAN is 67MB with 16.7 million parameters. For comparison, a small LLM like Qwen 3.5 9B is 5-6GB. Your GPU barely notices an upscaling model.

Your GPU	What works
GTX 1060 6GB	Upscayl, Real-ESRGAN (with tiling), chaiNNer. Handles everything in this guide
RTX 3060 12GB	Comfortable for all image upscaling, usable for video
RTX 3090 24GB	Fast at everything, including SUPIR and diffusion-based upscaling
AMD RX 6700 XT	Upscayl and NCNN tools work via Vulkan
Intel Arc A770	Vulkan support, works with Upscayl
Apple M1/M2/M3	Upscayl works on macOS. Real-ESRGAN NCNN works via MoltenVK
No GPU (CPU only)	Real-ESRGAN NCNN works on CPU. Slow (30-60 sec per image) but functional

Any gaming GPU from the last 8 years handles image upscaling. The only exception is integrated graphics without Vulkan support, and even then the CLI tools fall back to CPU.

When upscaling can’t help

AI upscaling invents detail based on patterns it learned during training. It doesn’t recover information that was never captured. A few cases where it falls apart:

Text below ~12px in the source. The model rebuilds letterforms as shapes and gets them wrong. Spacing changes, serifs mutate, characters drift. If you need to upscale a document, use a scanner at higher DPI instead.

Heavily compressed images. JPEG artifacts at quality 10 get sharpened right along with the real image. Real-ESRGAN amplifies the damage. For severely degraded photos, SUPIR is the better option (see the SUPIR section above — needs 12GB+ VRAM, much slower, but it can work around compression damage).

Already-clean photos. If your source is a recent phone photo at full resolution, 4x upscaling may add textures that weren’t there: fake skin pores, invented fabric patterns. Use 2x or skip upscaling entirely.

The 5-minute version

If you read nothing else:

Download Upscayl
Install it
Drag in your image
Pick “General Photo” model
Click “Upscale”

Free. Private. Works on any GPU made in the last decade. If you need more control, more models, or batch automation, the tools above have you covered.

# Or do it from the terminal in one line
pip install realesrgan && python -m realesrgan -i photo.jpg -o upscaled.png -n RealESRGAN_x4plus