More on this topic: Stable Diffusion Locally | Flux Locally | ComfyUI vs A1111 vs Fooocus | Best Local LLMs for Mac | Running LLMs on Mac M-Series

Image generation on Mac works. It’s slower than an NVIDIA GPU, and some tools aren’t as polished as their Linux/Windows versions, but you can generate real images locally on any Apple Silicon Mac right now. The question is which tool to use, and that depends on whether you want ease, speed, or flexibility.

There are three approaches worth considering: Draw Things (easiest, and honestly the best for most people), MLX stable diffusion (fastest native performance), and ComfyUI (most flexible but slowest on Mac). This guide covers all three with actual speed numbers so you can pick the right one.


What it is

Draw Things is a free Mac App Store app built specifically for Apple Silicon. It’s not a port of a Linux tool – it’s a native Mac app with Metal FlashAttention, Core ML acceleration, and on-demand weight loading that keeps memory usage low. The difference in speed compared to generic PyTorch tools on Mac is immediately obvious.

You install it from the App Store. You pick a model from the built-in downloader. You type a prompt and click generate. No Python, no terminal, no dependency hell.

What it supports

  • SD 1.5, SD 2.1, SDXL, SDXL Turbo
  • Flux.1 (Schnell and Dev), Flux.2
  • LoRA loading and on-device LoRA training
  • ControlNet (all major types)
  • Inpainting and outpainting
  • img2img, upscaling, pose editing
  • .safetensors model import (bring your own checkpoints)

I’ve been using Draw Things as my only image gen tool on Mac for months. Haven’t needed anything else.

Speed on Mac

Draw Things uses Metal FlashAttention, which is a custom Metal implementation of the attention mechanism. On M3/M4 chips, version 2.0 of this engine delivers about 20% faster inference than earlier versions. It also runs up to 25% faster than mflux and 94% faster than ggml-based implementations for Flux models (tested on M2 Ultra).

Approximate generation times (20 steps, default settings):

ModelResolutionM1 base 8GBM2 Pro 16GBM3 Pro 18GBM4 Max 36GB+
SD 1.5512x51220-30s8-15s6-12s3-6s
SDXL1024x1024Too slow25-40s18-30s8-15s
Flux Schnell1024x1024Won’t fit30-50s20-35s10-18s
Flux Dev1024x1024Won’t fitVery slow40-60s15-25s

These times are for the total generation, not per-step. Draw Things is roughly 3x faster than ComfyUI running the same model on the same Mac.

Memory requirements

Draw Things uses on-demand weight loading, which reduces memory overhead by up to 50% compared to tools that load the entire model at once. This is why it can run SD 1.5 on 8GB when ComfyUI can’t.

MemoryWhat worksWhat doesn’t
8GBSD 1.5 (with 8-bit models), small Flux distilledSDXL, Flux Schnell/Dev
16GBSDXL comfortably, Flux SchnellFlux Dev (loads but swaps)
24GBEverything including Flux DevFlux Dev + large LoRA stacks
36GB+Everything, comfortable batchingNothing off-limits

Setup

  1. Install from the Mac App Store (free)
  2. Open the app, go to the model browser
  3. Download SD 1.5 or SDXL (the app suggests compatible models for your hardware)
  4. Type a prompt, adjust settings if you want, click Generate

Time from zero to first image: about 5 minutes, most of which is downloading the model.


MLX stable diffusion: fastest native performance

What it is

Apple’s MLX framework includes a stable diffusion implementation that runs natively on Apple Silicon’s unified memory. It’s a Python library, not a GUI app. You write code or run command-line scripts to generate images.

MLX is faster than PyTorch + MPS (Metal Performance Shaders) for the same model because it targets Apple Silicon’s unified memory directly instead of going through a generic GPU abstraction. The tradeoff: fewer models supported, and you need to be comfortable with Python.

What it supports

  • SD 2.1 and SDXL Turbo (officially)
  • img2img
  • Quantization (4-bit text encoders, 8-bit UNet) for reduced memory
  • Batch generation

The model support is narrower than Draw Things or ComfyUI. SD 1.5 isn’t in the official examples (though community forks exist). Flux isn’t supported yet. If you need the latest models, MLX isn’t the right choice.

When to use it

MLX stable diffusion makes sense if you’re:

  • Building a Python pipeline that generates images as part of a larger workflow
  • Batch-generating hundreds of images and want maximum speed
  • Writing scripts that need programmatic control over every parameter
  • Comfortable with Python and command-line tools

It doesn’t make sense if you want to browse models, experiment with prompts visually, or need Flux/ControlNet/LoRA support.

Usage

pip install -r requirements.txt

# Text to image (SDXL Turbo, 4 images)
python txt2image.py "A photo of an astronaut riding a horse on Mars" --n_images 4 --n_rows 2

# Image to image
python image2image.py --strength 0.5 original.png "A lit fireplace"

# Quantized (for 8GB Macs)
python txt2image.py --n_images 4 -q "prompt here"

The -q flag quantizes text encoders to 4-bit and UNet to 8-bit. This lets SDXL Turbo run on an 8GB Mac Mini without swapping. Without quantization, you need at least 16GB.


ComfyUI on Mac: most flexible, slowest

What it is

ComfyUI is a node-based workflow editor for image generation. You can build things with it that the other tools can’t touch: multi-model pipelines, custom samplers, chained refiners, community workflow imports. It’s also the slowest option on Mac and the most annoying to install.

The Mac situation

ComfyUI uses PyTorch with Metal Performance Shaders (MPS) for GPU acceleration on Mac. MPS works, but it’s 2-4x slower than NVIDIA CUDA for the same operation. Every ComfyUI benchmark you see online with impressive 2-second SDXL generation times? That’s an RTX 4090. On Mac, expect 3-5x those numbers.

The ComfyUI Desktop app (beta) supports Apple Silicon, but some features are known to not work. The manual installation route is more reliable.

Installation

# Install Python 3.11+ and git via Homebrew
brew install python@3.11 git

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Create venv and install
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run with fp16 (important for Mac speed)
python main.py --force-fp16

The --force-fp16 flag matters. Without it, ComfyUI defaults to fp32 on Mac, which is roughly half the speed and uses twice the memory. I’ve seen people complain about ComfyUI being unusable on Mac, and half the time it’s because they missed this flag.

Speed comparison

On the same M2 Pro 16GB, generating the same image:

ModelDraw ThingsComfyUI (–force-fp16)MLX
SD 1.5 512x512 (20 steps)~10s~30s~8s
SDXL 1024x1024 (20 steps)~30s~90-110sN/A (limited support)
Flux Schnell 1024x1024 (4 steps)~35s~60-80sN/A

ComfyUI is 3x slower than Draw Things for the same model. That’s the Metal FlashAttention vs MPS difference. ComfyUI’s PyTorch MPS backend is generic GPU acceleration. Draw Things has hand-tuned Metal shaders for each operation.

When ComfyUI is still worth it

  • You need node-based workflows with custom pipelines
  • You’re importing workflows from the community (ComfyUI has the largest workflow ecosystem)
  • You want to chain models: base + refiner + upscaler in one pipeline
  • You’re already using ComfyUI on another machine and want the same workflow on Mac
  • You need custom nodes that only exist in the ComfyUI ecosystem

If none of those apply, use Draw Things. It’s faster for every common task.

Known Mac issues

  • MPS doesn’t support all PyTorch operations. Some custom nodes will fail with cryptic errors.
  • Memory reporting in ComfyUI assumes discrete GPU VRAM, not unified memory. The numbers shown in the UI aren’t accurate on Mac.
  • Some samplers are slower on MPS than others. Euler and DPM++ 2M work well. DDIM can be buggy.
  • Flux models need careful memory management on 16GB. Close everything else, use --force-fp16, and consider GGUF quantized Flux models to fit.

What to run at each memory tier

MemoryBest toolBest modelWhat to expect
8GBDraw ThingsSD 1.5 (8-bit)Usable for casual generation, 20-30s per image
16GBDraw ThingsSDXL or Flux SchnellGood quality, 15-40s per image depending on model
24GBDraw ThingsFlux DevEverything works, 15-25s for Flux
32GB+Draw Things or ComfyUIFlux Dev + LoRAsFast enough for iteration, ComfyUI viable for complex workflows

On 8GB, skip SDXL and Flux entirely. They technically load in some tools but swap to disk and generation times balloon to minutes per image. SD 1.5 with Draw Things’s 8-bit models is the 8GB sweet spot.

On 16GB, you have real choices. SDXL runs comfortably in Draw Things and produces much better images than SD 1.5. Flux Schnell works too – it’s a 4-step model designed for speed, so even at 30-50 seconds per image on Mac, you get fast iterations.

At 32GB+, you’re no longer memory-constrained. The speed gap between Mac and NVIDIA still exists, but you can run any model and use ComfyUI for complex workflows without worrying about crashes.


The honest PC comparison

I’d be leaving something out if I didn’t say this: if image generation is your primary use case and you don’t already own a Mac, a PC with an RTX 3060 12GB ($170 used) will generate images 3-5x faster than an M2 Pro.

SetupCostSD 1.5 512x512SDXL 1024x1024
M2 Pro 16GB MacAlready own~10s~30s
M4 Max 36GB Mac$3,000+~4s~10s
RTX 3060 12GB PC~$170 (used GPU)~3s~8s
RTX 4090 PC~$1,600~1s~2s

A $170 used GPU matches a $3,000 Mac. CUDA is that much faster for diffusion models.

But if you already have a Mac and don’t want a second machine sitting under your desk, Draw Things makes the speed difference tolerable. 10 seconds per SD 1.5 image is fine for iterating on prompts. And newer models like Flux Schnell only need 4 steps, so the per-step speed penalty matters less.


The bottom line

Start with Draw Things. It’s free, fast, and takes 5 minutes from install to first image. If you find yourself wanting node-based workflows or custom pipelines, add ComfyUI. If you’re writing Python scripts that need image generation, look at MLX.

Most Mac users will never need anything beyond Draw Things. It handles SD 1.5, SDXL, Flux, LoRAs, ControlNet, inpainting, and LoRA training – all through a clean native interface with hand-tuned Metal shaders that actually use your hardware well.