ControlNet Guide: Precise AI Image Control on Your GPU
๐ More on this topic: Stable Diffusion Locally ยท Flux Locally ยท ComfyUI vs A1111 vs Fooocus ยท VRAM Requirements ยท Planning Tool
Every AI image you generate is a dice roll. Same prompt, same settings, completely different composition. You can’t tell Stable Diffusion “put the person HERE in THIS pose.” The text prompt controls what’s in the image, not where or how.
ControlNet fixes that. It takes a structural guide โ an edge map, a body pose, a depth map โ and forces the diffusion model to follow that structure. Same prompt, but now the output matches the layout you specified. It’s the difference between “a person standing in a room” and “a person standing in exactly this pose in exactly this room layout.”
This guide covers what ControlNet does, which preprocessors to use, VRAM requirements, setup in ComfyUI and Automatic1111, Flux ControlNet support, and practical workflows that go beyond generating random images.
What ControlNet Does
ControlNet is a neural network that bolts onto a diffusion model (SD 1.5, SDXL, or Flux) and adds spatial conditioning. You provide two inputs:
- A text prompt โ what you want in the image
- A control image โ the structure you want it to follow
The control image gets processed through a preprocessor (Canny edge detection, OpenPose skeleton, depth estimation, etc.) to create a control map. ControlNet then guides the diffusion process to respect that map while still following your text prompt.
It doesn’t replace your model. It adds a layer of structural guidance on top. Your checkpoint, your LoRAs, your sampler settings all still matter โ ControlNet just constrains the spatial layout.
Technically, it works by creating a trainable copy of the model’s encoder connected through “zero convolutions” โ 1x1 convolution layers initialized to zeros so it starts with zero influence and gradually learns the conditioning. The original model weights stay locked. This is why ControlNet models are roughly half the size of the base model they were trained on.
Control Types: What Each Preprocessor Does
Each preprocessor extracts different structural information from a reference image. Here’s what actually matters:
| Preprocessor | What It Extracts | Best For | Use When |
|---|---|---|---|
| Canny | Sharp edge outlines from contrast boundaries | Architecture, products, clean line art | You want to preserve exact edges and outlines |
| OpenPose / DWPose | Human body skeleton (joints, limbs, optionally face and hands) | Character poses, figure drawing | You want a person in a specific pose |
| Depth (Depth Anything V2) | Grayscale depth map (light = close, dark = far) | Room layouts, scene composition, 3D feel | You want to preserve spatial depth and perspective |
| Lineart | Clean line drawings from photos | Illustrations, coloring book style, manga | You want a drawn/illustrated version of a photo |
| Scribble | Rough sketch interpretation | Quick concept art from rough drawings | You drew something by hand and want it polished |
| MLSD | Straight architectural lines only | Buildings, interiors, man-made structures | You need precise geometric structure |
| HED / SoftEdge | Soft, smooth edge outlines | Style transfer, recoloring | You want softer structural guidance than Canny |
| Normal Map | Surface orientation (RGB-encoded surface normals) | 3D-to-2D rendering, surface detail | You’re working with 3D models or need surface info |
| Segmentation | Color-coded region labels (sky, person, building) | Scene composition control | You want to control which regions contain what |
| Tile | Processes image in chunks for detail enhancement | Upscaling, adding detail at same resolution | You want to upscale while adding new detail |
| Inpaint | Masked regions for selective regeneration | Fixing specific areas of an image | You want to change only part of an image |
The most commonly used: Canny, OpenPose/DWPose, and Depth cover about 80% of ControlNet use cases. Start with these three.
Related Tools (Not Technically ControlNet)
IP-Adapter extracts style and content features from a reference image. Where ControlNet controls spatial structure (where things are), IP-Adapter controls appearance (what things look like). They’re commonly used together โ IP-Adapter for face/style consistency, ControlNet for pose/composition.
T2I-Adapter is a lightweight alternative to ControlNet at only 158MB per model (vs ControlNet’s ~1.4GB). It runs the conditioning once for the entire generation instead of every step. Less precise control, but uses 93% less storage and minimal extra VRAM.
VRAM Requirements
ControlNet loads alongside your base model, so you need enough VRAM for both.
| Setup | Base Model VRAM | ControlNet Adds | Total Needed | Fits On |
|---|---|---|---|---|
| SD 1.5 + 1 ControlNet (512x512) | 4-6 GB | ~1-2 GB | 6-8 GB | 8 GB GPUs (RTX 3060, 4060) |
| SD 1.5 + 3 ControlNets (512x512) | 4-6 GB | ~4-6 GB | 10-12 GB | 12 GB GPUs (RTX 3060 12GB) |
| SDXL + 1 ControlNet (1024x1024) | 8-10 GB | ~2-4 GB | 10-14 GB | 12-16 GB GPUs |
| Flux + 1 ControlNet (1024x1024) | ~18 GB | ~2-3 GB | 20-22 GB | 24 GB GPUs (RTX 3090, 4090) |
Key points:
- Each additional ControlNet stacks. Three ControlNets can add 4-6GB on SD 1.5.
- T2I-Adapter adds only ~0.15 GB โ use it instead of ControlNet on tight VRAM budgets.
- Flux on lower VRAM: Use a GGUF-quantized Flux base model (Q5_K_M fits 8GB GPUs at 1024x1024 with text encoder offloading), which frees room for a ControlNet model. Quality trades off against VRAM savings.
- Both A1111 and ComfyUI have low-VRAM modes that offload ControlNet processing at the cost of speed.
For a complete breakdown of what fits on each GPU tier, see our VRAM requirements guide. For GPU buying advice, see the GPU buying guide.
Setup: ComfyUI (Recommended)
ComfyUI handles ControlNet better than A1111 for most users โ its node-based workflow makes multi-ControlNet setups cleaner and it manages VRAM more efficiently. For a full comparison, see our ComfyUI vs A1111 vs Fooocus guide.
Step 1: Install Custom Nodes
If you have ComfyUI Manager installed (you should), open ComfyUI and click Manager > Install Missing Custom Nodes. Search for and install:
| Package | What It Does |
|---|---|
| comfyui_controlnet_aux (Fannovel16) | All preprocessors: Canny, OpenPose, Depth Anything, Lineart, MLSD, Scribble, etc. |
| ComfyUI-Advanced-ControlNet | Per-layer weights, soft weights, advanced multi-ControlNet conditioning |
| x-flux-comfyui (XLabs-AI) | Required for XLabs Flux ControlNet models specifically |
Restart ComfyUI after installing.
Step 2: Download ControlNet Models
Place .safetensors files in ComfyUI/models/controlnet/.
For SD 1.5 โ download from lllyasviel/ControlNet-v1-1 on HuggingFace. Start with:
control_v11p_sd15_canny.pth(~1.4 GB)control_v11p_sd15_openpose.pth(~1.4 GB)control_v11f1p_sd15_depth.pth(~1.4 GB)
For SDXL โ get xinsir/controlnet-union-sdxl-1.0. One model handles 10+ control types.
For Flux โ get Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0 (~4 GB, FP8 version available for lower VRAM).
Step 3: Build a Basic Workflow
The core ControlNet workflow in ComfyUI:
Load Image โ Preprocessor โ Apply ControlNet โ KSampler โ Save Image
- Load Image node โ your reference/control image
- AIO Preprocessor or specific preprocessor (Canny, DWPose, etc.) โ generates the control map
- Load ControlNet Model node โ loads your
.safetensorsControlNet file - Apply ControlNet node โ connects control map + model to conditioning
- KSampler โ generates with normal text prompt + ControlNet conditioning
The Apply ControlNet node connects to your positive conditioning (the same conditioning your text prompt feeds into). The ControlNet and text prompt work together โ structure from ControlNet, content from your prompt.
Setup: Automatic1111
Step 1: Install the Extension
- Open A1111 > Extensions tab > Install from URL
- Paste:
https://github.com/Mikubill/sd-webui-controlnet.git - Click Install, then go to Installed > Apply and restart UI
A collapsible “ControlNet” section appears in the txt2img and img2img tabs.
Step 2: Download Models
Place ControlNet model files in:
stable-diffusion-webui/extensions/sd-webui-controlnet/models/
Same models as listed in the ComfyUI section above. Make sure ControlNet model versions match your base checkpoint โ SD 1.5 ControlNets only work with SD 1.5 models, SDXL ControlNets with SDXL.
Step 3: Use the ControlNet Panel
- Upload your reference image in the ControlNet section
- Enable the checkbox
- Select a Preprocessor (Canny, OpenPose, etc.)
- Select the matching Model from the dropdown
- Check Pixel Perfect โ this auto-sets the preprocessor resolution to match your output
- Click Generate
Key settings:
- Control Weight (0-2, default 1.0): How strongly ControlNet influences the output. 0.7-1.0 is the sweet spot for most tasks.
- Starting/Ending Control Step (0-1): When ControlNet applies during generation. Ending at 0.8 lets the model “clean up” in the final steps.
- Control Mode: “Balanced” for most use cases. “ControlNet is More Important” when you need strict adherence to the structure.
Multi-ControlNet
A1111 supports up to 10 simultaneous ControlNets. Configure the max in Settings > ControlNet > Multi ControlNet: Max models amount. Each unit gets its own preprocessor, model, weight, and control mode.
ControlNet for Flux
Flux ControlNet is still maturing but usable. SD 1.5 has the most complete and battle-tested ControlNet ecosystem (14 official models). Flux is catching up.
Current Best Options
| Model | Developer | Control Types | Size | Notes |
|---|---|---|---|---|
| Union Pro 2.0 | Shakker Labs | Canny, soft edge, depth, pose, gray | 3.98 GB | Best all-in-one for Flux. FP8 version available. |
| XLabs v3 | XLabs-AI | Canny, depth, HED (separate models) | 1.49 GB each | Best single-mode quality. |
| Union | InstantX | Canny, tile, depth, blur, pose, gray, lq | 6.6 GB | Beta. More modes but larger. |
| Flux Tools | Black Forest Labs | Canny, depth | Varies | Official, not technically ControlNet. |
Recommended settings for Shakker Labs Union Pro 2.0:
- Conditioning scale: 0.7-0.9 (lower than SD 1.5 ControlNet’s typical 1.0)
- Guidance end: 0.65-0.8 depending on control type
In ComfyUI: Use the x-flux-comfyui nodes for XLabs models, or native ComfyUI Flux ControlNet nodes for InstantX/Shakker Labs models.
The reality: If ControlNet is your primary workflow, SD 1.5 still offers the most control types, the smallest model files, the most community resources, and the lowest VRAM requirements. Flux produces better base images but has fewer ControlNet options. SDXL sits in the middle โ xinsir’s Union model covers most use cases in one file.
Practical Workflows
Pose Matching
Turn a reference photo into a completely different style while keeping the exact pose.
- Load your reference photo (person in desired pose)
- Run through OpenPose or DWPose preprocessor โ skeleton map
- Apply ControlNet with the pose map
- Write a prompt for the style you want: “oil painting of a knight in armor, dramatic lighting”
- Generate โ output follows the skeleton but renders in your chosen style
Pro tip: Combine IP-Adapter (for face/style consistency) with ControlNet OpenPose (for pose). This gives you the same character in different poses โ useful for character sheets and comic panels.
Architecture and Interior Design
Preserve room layout while completely changing the style.
- Photograph your room or building
- Run through Depth preprocessor โ depth map (preserves spatial layout)
- Optionally add a second ControlNet with Canny for edge detail
- Prompt: “modern minimalist living room, warm lighting, hardwood floors”
- Generate โ same room dimensions and furniture placement, different style
MLSD works better than Canny for buildings specifically โ it detects straight architectural lines and ignores organic details.
Sketch to Polished Image
Turn a rough hand drawing into a polished result.
- Draw your concept on paper or in a drawing app (doesn’t need to be good)
- Use Scribble preprocessor for rough sketches, Lineart for cleaner drawings
- Apply ControlNet
- Describe the final result in your prompt
- Generate โ the AI follows your composition and structure
This is the lowest-barrier ControlNet workflow. A stick figure with basic shapes becomes a fully rendered scene.
Upscaling with Tile ControlNet
Add real detail when upscaling (not just blur removal).
- Start with your low-resolution image
- Upscale 2-4x with a standard upscaler (ESRGAN, Real-ESRGAN)
- Run through Tile ControlNet โ it processes in 512x512 chunks, adding new detail to each tile
- Result: higher resolution with genuinely new detail, not just interpolation
In A1111, combine with the Ultimate SD Upscale script for automatic tile processing. In ComfyUI, use a tiled processing workflow.
QR Code Art
Generate artistic QR codes that are functional and visually appealing.
- Generate a standard QR code
- Use it as the control image with a dedicated QR ControlNet model (
DionTimmer/controlnet_qrcode) - Set weight to 1.0-1.5 (high enough to keep the code scannable)
- Prompt with your desired art style
- Test scannability โ lower the weight if the code breaks, raise it if the art overwhelms the pattern
Troubleshooting
“ControlNet Has No Effect”
The most common beginner issue. Check these in order:
- Is ControlNet enabled? The checkbox must be checked.
- Is the preprocessor set correctly? If set to “none” without a pre-processed control map, nothing happens.
- Does the ControlNet model match the preprocessor? A Canny model needs a Canny preprocessor (or pre-generated Canny map).
- Does the ControlNet model match your checkpoint? SD 1.5 ControlNet + SDXL checkpoint = error or no effect.
- Is the control weight above zero? Default 1.0 is fine. Below 0.3 you’ll barely see an effect.
- Is the preprocessor resolution correct? In A1111, check “Pixel Perfect” to auto-match. In ComfyUI, ensure the preprocessor output resolution matches your generation resolution.
Model Version Mismatch Errors
RuntimeError: mat1 and mat2 shapes cannot be multiplied
This means you’re loading a ControlNet model built for a different base model architecture. SD 1.5 ControlNets are ~1.4GB, SDXL ControlNets are larger. They’re not interchangeable.
VRAM Out of Memory
- Reduce active ControlNets โ each one adds 1-4GB
- Enable Low VRAM mode in ControlNet settings (A1111) or use
--medvram-sdxl - Switch to T2I-Adapters where possible โ 93% less memory
- Lower the generation resolution โ VRAM scales with resolution
- ComfyUI handles multi-ControlNet more efficiently than A1111 for memory management
Weak Control / Output Ignoring Structure
- Raise the control weight from 1.0 to 1.2-1.5 (above 1.5 may cause artifacts)
- Use “ControlNet is More Important” mode in A1111
- Increase the ending control step โ at 0.5 the model has half the steps to deviate
- Check your preprocessor output โ preview the control map before generating. If the map is messy, the output will be messy.
The Bottom Line
ControlNet turns text-to-image generation from “hope for the best” into “follow this structure.” Three preprocessors cover most needs: Canny for edges, OpenPose/DWPose for poses, and Depth for spatial layout.
Start here:
- Install ComfyUI with
comfyui_controlnet_aux - Download 3 ControlNet models matching your base model (Canny, OpenPose, Depth)
- Load a reference photo, pick a preprocessor, write a prompt, generate
For SD 1.5 (6-8GB VRAM): The complete ecosystem. 14 official models, smallest files, most tutorials online.
For SDXL (12-16GB VRAM): Use xinsir’s ControlNet Union โ one model for 10+ control types.
For Flux (24GB VRAM): Shakker Labs Union Pro 2.0 is the best all-in-one. XLabs v3 individual models for highest single-mode quality.
If you’re already running Stable Diffusion locally or Flux, ControlNet is the single biggest upgrade to your workflow. It takes 10 minutes to install and immediately makes every generation more useful.