๐Ÿ“š More on this topic: Stable Diffusion Locally ยท Flux Locally ยท ComfyUI vs A1111 vs Fooocus ยท Planning Tool

Three image models, three different eras. SD 1.5 launched in 2022 and still runs on potato GPUs. SDXL arrived mid-2023 with 4x the resolution. Flux dropped in 2024 and produces images that look like a different technology entirely.

The problem: they all run locally, they all have ecosystems, and picking the wrong one means downloading gigabytes of models you’ll replace in a week. This guide compares them on the numbers that matter and tells you which to install for your GPU and your use case.


The Three Models at a Glance

SD 1.5SDXLFlux Dev
ReleaseAugust 2022July 2023August 2024
Parameters~860M~3.5B (6.6B with refiner)12B
Native resolution512x5121024x10241024x1024+
Minimum VRAM4 GB8 GB12 GB (quantized)
ArchitectureUNetUNet (larger)DiT (transformer)
Text renderingPoorPoorGood (95%+ single-word accuracy)
LicenseCreativeML Open RAIL++CreativeML Open RAIL++Non-commercial (Dev) / Apache 2.0 (Schnell)
StatusDeprecated, huge ecosystemActive, maturingActive, growing fast

SD 1.5 is the Honda Civic of image gen. Cheap to run, parts everywhere, gets the job done. SDXL is the midrange sedan: better in every measurable way, still affordable. Flux is the sports car: noticeably better output, but you need the hardware to match.


VRAM Requirements

This is usually what decides the question for you.

Minimum VRAM to Generate Images

ModelPrecisionVRAM UsedMinimum GPU
SD 1.5FP16~4 GBGTX 1060 6GB, RTX 3050
SD 1.5 + ControlNetFP16~6-8 GBRTX 3060, RTX 4060
SDXLFP16~7-8 GBRTX 3060 8GB, RTX 4060
SDXL + RefinerFP16~12-18 GBRTX 3060 12GB+ (sequential), 16GB+ (simultaneous)
Flux DevFP16~24 GBRTX 3090, RTX 4090
Flux DevFP8~12-16 GBRTX 3060 12GB, RTX 4070
Flux DevGGUF Q5~8-10 GBRTX 3060 8GB (tight)
Flux DevGGUF Q4 / NF4~6-8 GBRTX 4060, RTX 3060
Flux DevNunchaku INT4~4-8 GBRTX 3060 (with CPU offload)
Flux SchnellFP8~6-8 GBRTX 4060, RTX 3060

A few things jump out. SD 1.5 runs on almost anything with a discrete GPU. SDXL needs a modern 8GB card. Flux at full precision is a 24GB-only proposition, but quantized versions bring it within reach of 8-12GB cards with varying quality tradeoffs.

The Nunchaku/SVDQuant INT4 path deserves a mention: it pushes Flux down to 4-8GB VRAM with per-layer CPU offloading and claims 3x faster inference than standard NF4. If you have an 8GB card and want Flux, this is the path to try first.

The SDXL refiner trap: You’ll see guides telling you to load both the base and refiner model. That needs 12-18GB. Most people skip the refiner entirely and get good results. Unless you’re chasing the last 5% of detail, just run the base.


Generation Speed

Seconds per image on common GPUs. SD 1.5 at 512x512/20 steps, SDXL at 1024x1024/20 steps, Flux Dev at 1024x1024/20 steps.

GPUSD 1.5SDXLFlux Dev (FP16)Flux Dev (FP8)Flux Schnell (4 steps)
RTX 4090~1 sec~2-4 sec~18 sec~11-14 sec~3-5 sec
RTX 3090~2-3 sec~6 sec~40 sec~26-30 sec~6-10 sec
RTX 4070~2-3 sec~7 secโ€”~49 sec~10-15 sec
RTX 4060~4-5 sec~16 secโ€”โ€”~20-30 sec
RTX 3060 12GB~5-6 sec~13 sec~10+ min*~400 sec*~20-40 sec

*RTX 3060 running Flux Dev at FP16/FP8 involves heavy CPU offloading and is borderline unusable for iterating. Use GGUF Q4 instead (~2-3 minutes) or Nunchaku INT4 for better speed.

The speed gap is real. SD 1.5 at 512x512 is essentially instant on modern GPUs. SDXL at 1024x1024 is comfortable on anything 8GB+. Flux Dev is where patience enters the equation. On a 3090, 40 seconds per image is fine for final renders but painful for prompt iteration. That’s where Flux Schnell comes in: 4 steps, 6-10 seconds on a 3090, good enough for drafting.

Optimization options that help:

  • xformers: ~5-10% speed gain, mainly saves VRAM
  • TensorRT FP16: ~1.5-2x faster than native PyTorch
  • TensorRT FP8: Up to 2.4x faster (tested on Flux Dev, needs RTX 40/50 series)

Image Quality

This is where the generational differences actually show.

SD 1.5: Workable, but dated

Native output is 512x512. You almost always need hires fix or an upscaler to get usable images. Anatomy is unreliable: extra fingers, mangled hands, weird limb counts. Complex prompts frequently get ignored or misinterpreted. Negative prompts are mandatory to avoid the worst artifacts.

The quality ceiling is surprisingly high with the right checkpoint, LoRA stack, and prompt engineering. But getting there takes work. You’re fighting the model, not collaborating with it.

SDXL: The solid middle

1024x1024 native resolution means no upscaling just to get something usable. Anatomy improved over SD 1.5 but still inconsistent: correct finger count about 45% of the time in complex hand poses. Prompt adherence is better, negative prompts still help. The quality jump from SD 1.5 to SDXL is immediately obvious.

Community checkpoints like Juggernaut XL and RealVisXL push photorealism further than the base model. If you need reliable output without babysitting every generation, SDXL with a good checkpoint is the practical choice.

Flux: Different league

Correct finger count 85% of the time. Natural hand positioning 90%. Single-word text rendering accuracy 95%+. Multi-word text 85-90%. Complex prompts with spatial relationships (“a red ball on top of a blue box to the left of a green cone”) actually work.

Flux uses a different architecture (DiT transformer vs UNet) and a different training approach (flow matching vs diffusion). The result is images that feel like they understood the prompt instead of pattern-matching parts of it. Simple prompts produce excellent results without negative prompt engineering.

The tradeoff: no native negative prompt support. Flux uses flow matching without classifier-free guidance, so the standard negative prompt workflow doesn’t apply. Workarounds exist (Dynamic Thresholding, Perpendicular Negative Guidance) but they’re 2-3x slower. In practice, most Flux users don’t bother with negatives because the baseline output quality is high enough.


LoRA and Checkpoint Ecosystem

LoRAs don’t transfer between model families. An SD 1.5 LoRA won’t work with SDXL. An SDXL LoRA won’t work with Flux. This matters more than most people realize when choosing a model.

SD 1.5SDXLFlux
LoRA availabilityLargest. 3+ years of community work. Tens of thousands on CivitAI.Second largest. Growing since mid-2023. Thousands available.Smallest but growing fast. Training is more expensive.
Checkpoint/merge varietyMassive. Hundreds of specialized merges for every style.Strong. Juggernaut XL, RealVisXL, and many others.Limited. Base model is already very good.
Anime/stylizedDominant. This is where SD 1.5 still wins.Good, catching up.Growing, but less variety.
LoRA training VRAM8 GB minimum, 12 GB comfortable10-12 GB minimum (aggressive optimization), 16-24 GB comfortable24 GB minimum (QLoRA, rank 4-8), 48 GB comfortable
Training cost (CivitAI)500 Buzz500 Buzz2,000 Buzz (4x more)

If you need a LoRA for a niche anime style, a specific character, or a particular aesthetic, check CivitAI for your model family first. SD 1.5 almost certainly has it. SDXL probably has it. Flux might not yet.

Flux LoRA training is also more expensive across the board. You need 24GB+ VRAM locally (vs 8GB for SD 1.5), and it costs 4x more on CivitAI’s on-site trainer. The flip side: Flux LoRAs need fewer images (20-30 is often enough) and fewer training steps (500-1,500 vs 3,000-5,000 for SDXL).


ControlNet Support

ControlNet lets you guide image generation with reference images: edge maps, depth maps, poses, scribbles.

SD 1.5SDXLFlux
Official ControlNetYes. 14 model types in v1.1.No official models. Community-built.Partial. BFL released Canny and Depth.
Types availableCanny, Depth, OpenPose, Scribble, Segmentation, Tile, LineArt, NormalBAE, HED, MLSD, Shuffle, IP2P, Inpaint, LineArt AnimeMost SD 1.5 types ported by community. Less standardized.Canny, Depth, HED, Surface Normals, Union (multi-mode). InstantX, XLabs-AI, Jasperai.
Model sizesSmall (136 MB LoRA), Medium (723 MB), Large (1.45 GB)Varies by author12B per official model (same as base Flux)
MaturityMature, well-documentedFunctional, some gapsCatching up. Fewer options, larger downloads.

SD 1.5 has the most complete ControlNet ecosystem by a wide margin. If your workflow depends on specific ControlNet types (especially niche ones like Segmentation or Shuffle), check whether they exist for your target model before switching.


Flux Dev vs Flux Schnell

Both use the same 12B architecture. The differences matter.

Flux DevFlux Schnell
Steps20-301-4
Speed (RTX 3090)~40 sec~6-10 sec
QualityHigher detail, better micro-texturesGood. Slightly less fine detail.
Text rendering95%+ single-word80-85% single-word
LicenseNon-commercial (commercial license available from BFL)Apache 2.0 (fully open)
Best forFinal renders, portfolio workQuick drafts, iteration, commercial projects

Use Schnell for drafting and iteration: test your prompt, get the composition right at 4 steps in seconds, then switch to Dev for the final generation. If you’re building something commercial (selling prints, using images in a product), Schnell’s Apache 2.0 license avoids the licensing question entirely.


What About SD 3.5?

Stability AI released SD 3.5 in late 2024 as the successor to SDXL. It exists in three sizes:

VariantParametersVRAMNotes
SD 3.5 Medium2.5B~3 GBSurprisingly light. Runs on 12GB cards easily.
SD 3.5 Large8B~18 GB (11 GB with TensorRT FP8)Mid-range quality.
SD 3.5 Large Turbo8B (distilled)~18 GBFewer steps, faster.

The honest take: the community largely skipped SD 3.5 in favor of Flux. SD 3.0 launched with quality issues and restrictive licensing, and even though 3.5 improved on both, the momentum had already shifted. Most active development on CivitAI and in ComfyUI workflows targets either SDXL or Flux.

SD 3.5 Medium is the one worth knowing about. At 2.5B parameters and ~3GB inference VRAM, it’s the lightest modern image model and runs on GPUs that can’t handle SDXL. If you have a 6GB card and SD 1.5 quality isn’t cutting it, SD 3.5 Medium is an upgrade path.

One licensing note: Stability AI added an explicit content prohibition to SD 3.5 in July 2025, which caught users off guard. Free for commercial use under $1M revenue, but check the current terms.


What Fits Your GPU

8GB VRAM (RTX 4060, RTX 3060 8GB)

ModelHowExperience
SD 1.5Native FP16Fast, full quality. Best experience at this tier.
SDXLNative FP16 (tight)Works but close to the limit. Short context. Disable refiner.
Flux DevGGUF Q4 or Nunchaku INT4Possible but slow (2-5 min/image). Noticeable quality loss vs FP16.
Flux SchnellFP8 / GGUF Q4Workable at 4 steps. ~20-30 sec/image.

At 8GB, SD 1.5 and SDXL are the comfortable options. Flux is technically possible with aggressive quantization but the experience is rough for iterating. If you’re generating final images from a known-good prompt, Flux Q4 is fine. For exploring and experimenting, stick with SDXL.

12GB VRAM (RTX 3060 12GB, RTX 4070)

ModelHowExperience
SD 1.5Native FP16Overkill. Runs great with ControlNet and LoRAs stacked.
SDXLNative FP16Comfortable. Room for LoRAs and ControlNet.
Flux DevFP8 or GGUF Q5Good quality, manageable speed. The sweet spot for Flux.
Flux SchnellFP8Fast, good quality. ~20-40 sec/image.

12GB is where the choice gets interesting. You can run all three families without offloading. Flux Dev at FP8 or GGUF Q5 produces near-full-quality images. This is the minimum GPU I’d recommend for someone who wants to primarily use Flux.

24GB VRAM (RTX 3090, RTX 4090)

ModelHowExperience
SD 1.5Native FP16Near instant. Under 3 seconds/image.
SDXLNative FP16 + RefinerFull pipeline. 4-6 seconds/image.
Flux DevFP16 full precisionNo compromises. Best possible output. ~18-40 sec/image.
Flux SchnellFP163-10 seconds/image. Fastest Flux experience.

No compromises at 24GB. Run Flux Dev at full FP16 for maximum quality. Keep SDXL around for workflows where you need specific LoRAs or ControlNet types that Flux doesn’t support yet. Speed is the only tradeoff: Flux Dev at 18-40 seconds per image vs SDXL at 4-6 seconds.


Choose Your Model

Use caseBest modelWhy
PhotorealismFlux DevBest anatomy, prompt adherence, natural lighting
Anime/illustrationSD 1.5 (with checkpoint)Biggest LoRA library for anime styles. Nothing else comes close.
Text in imagesFlux Dev95%+ accuracy on single words. Only reliable option.
ControlNet workflowsSD 1.514 official types, most complete ecosystem
Quick drafts/iterationFlux Schnell or SDXLSchnell at 4 steps, SDXL at 20 steps. Both fast enough.
Commercial useFlux Schnell (Apache 2.0) or SDXLFlux Dev requires a commercial license from BFL.
Low VRAM (4-6 GB)SD 1.5Only model that runs natively. SD 3.5 Medium as alternative.
Training your own LoRAsSD 1.5 (8GB) or SDXL (12GB)Flux LoRA training needs 24GB+.
Maximum quality, any hardwareFlux Dev (FP16, 24GB)Best output of any open model, period.

The Bottom Line

SD 1.5 isn’t dead, but it’s the fallback option now. Use it for anime workflows with specific LoRAs, for ControlNet-heavy pipelines, or when you’re stuck on a 4-6GB card.

SDXL is the safe pick. Runs on 8GB, generates at 1024x1024, has a mature ecosystem, and produces good images without fiddling. If you’re not sure what to pick, start here.

Flux is where image generation is going. If you have 12GB+, start with Flux Dev in GGUF Q5 through ComfyUI. The quality difference over SDXL is obvious from the first image. Use Schnell for drafts, Dev for finals.

# SDXL (8GB+ VRAM) โ€” install ComfyUI, download from CivitAI:
# Juggernaut XL or RealVisXL checkpoint

# Flux (12GB+ VRAM) โ€” install ComfyUI, then:
# Download flux1-dev-Q5_K_S.gguf from city96/FLUX.1-dev-gguf on HuggingFace
# Download clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors


Sources: Tom’s Hardware SD Benchmarks, ComfyUI GPU Benchmarks, Stable Diffusion Art SDXL vs Flux, Nunchaku/SVDQuant, BFL Flux.2 Blog, CivitAI LoRA Training Guide