Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Black Forest Labs - Image Generation

Open Source AI Image Generator: Developer's Guide 2026

Open source AI image generator guide for developers in 2026. Install Flux, SDXL, and ComfyUI, write your first API workflow, and ship text-to-image locally.

License GPL-3.0
TL;DR
  • Full install walkthrough for ComfyUI plus Flux.1-dev on Linux with a consumer GPU.
  • Runs locally, exposes HTTP API on port 8188, scriptable from Python or curl.
  • Works with RTX 3060 12GB at minimum; RTX 4090 24GB for full-quality Flux.1-dev at native resolution.
System Requirements
RAM16GB
GPURTX 3060 12GB or better
VRAM12GB+
✓ Apple Silicon

You want to run a real open source AI image generator on your own hardware in 2026 without pasting API keys into someone else's cloud. This guide walks you from a clean Linux box to a running ComfyUI server with FLUX.2-dev loaded, a rendered first prompt, and a working HTTP API call. Every command here is copy-pasteable and tied to an upstream source.

We picked ComfyUI plus FLUX.2-dev because that pairing is the strongest consumer-GPU AI image generator stack most indie developers will actually ship in 2026. The 4-bit BNB build of FLUX.2-dev runs on a single RTX 4090 with 24GB VRAM, and the smaller Apache 2.0 FLUX.2-klein-4B variant drops it onto an RTX 3090 or RTX 4070 when you need commercial licensing.

Prerequisites for a Self-Hosted AI Image Generator

Before you start, have the following ready:

  • A Linux box (Ubuntu 22.04 or 24.04 recommended). WSL2 on Windows works, but expect lower throughput.
  • An NVIDIA GPU with at least 13GB VRAM for FLUX.2-klein-4B, or 24GB VRAM for FLUX.2-dev BNB-4bit. RTX 3090, RTX 4070, RTX 4090, and RTX 5090 are the sweet spot.
  • NVIDIA driver 560 or newer and CUDA 12.4 or newer. Check with nvidia-smi.
  • Python 3.12, 3.13, or 3.14 (current ComfyUI ships against PyTorch cu130).
  • A Hugging Face account with an access token, since FLUX.2-dev is gated.
  • Roughly 60GB of free disk space for ComfyUI, the FLUX.2 checkpoint, and the text encoders.

If nvidia-smi reports a working GPU and python3 --version returns 3.12 or higher, you are ready.

Step 1: Install ComfyUI From Source

ComfyUI is the node-based AI image generator runtime maintained at comfyanonymous/ComfyUI. We install it from source in a venv so nothing leaks into your system Python.

cd ~
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu130
pip install -r requirements.txt

The PyTorch install pulls about 3GB. The rest of requirements.txt is small. On a 500Mbps connection the whole install finishes in under five minutes. Current ComfyUI ships native support for FLUX.2 out of the box, no custom nodes required.

Step 2: Download the FLUX.2-dev Checkpoint

FLUX.2-dev is hosted at black-forest-labs/FLUX.2-dev. The repository is gated, so you need to accept the license on the web UI first, then log in to the Hugging Face CLI.

License warning. FLUX.2-dev ships under the FLUX Non-Commercial License. You can use it for personal projects, research, and generating images that you use commercially, but you cannot host FLUX.2-dev as a paid service without a commercial license from Black Forest Labs. If you need a fully commercial open source AI image generator, use FLUX.2-klein-4B (Apache 2.0) instead. It is a 4B step-distilled rectified flow transformer that generates an image in under a second on an RTX 4070 and fits in 13GB VRAM.

Authenticate and download the 4-bit BNB build (recommended for RTX 3090/4090 class GPUs):

pip install -U "huggingface_hub[cli]" bitsandbytes accelerate

huggingface-cli login  # paste your HF token

cd ~/ComfyUI
mkdir -p models/diffusion_models models/vae models/text_encoders

# FLUX.2-dev BNB-4bit transformer (~16 GB)
huggingface-cli download diffusers/FLUX.2-dev-bnb-4bit \
  --local-dir models/diffusion_models/FLUX.2-dev-bnb-4bit

# VAE from the FLUX.2-dev repo
huggingface-cli download black-forest-labs/FLUX.2-dev ae.safetensors \
  --local-dir models/vae

# Mistral text encoder used by FLUX.2 (downloaded automatically on first run
# when you accept the FLUX.2-dev license, or pre-fetch it here)
huggingface-cli download black-forest-labs/FLUX.2-dev \
  text_encoder/model.safetensors --local-dir models/text_encoders/flux2

Total download is roughly 25GB for the 4-bit build. On a gigabit connection this takes about 4 minutes. If you are on a 12GB or 16GB VRAM card, switch to FLUX.2-klein-4B (Apache 2.0, 13GB VRAM) or the Stability AI flagship Stable Diffusion 3.5 Large with GGUF Q8 quantizations instead.

Step 3: Launch the ComfyUI Server

Start ComfyUI listening on all interfaces so you can hit it from another machine on your LAN:

cd ~/ComfyUI
source venv/bin/activate
python main.py --listen 0.0.0.0 --port 8188

You should see "To see the GUI go to: http://0.0.0.0:8188" in the terminal. Open a browser on your laptop and hit http://your-box-ip:8188.

If the server dies immediately with a CUDA out of memory error, add --lowvram to the command. On a 16GB or 20GB card you will want --lowvram plus the BNB-4bit or fp8 FLUX.2 weights, and you may also want to offload the Mistral text encoder to CPU with --cpu-text-enc.

Step 4: Run Your First FLUX.2 Prompt

Download the example FLUX.2 workflow from the ComfyUI FLUX.2 tutorial. Drag the example image onto the ComfyUI canvas. ComfyUI reads the workflow embedded in the PNG and reconstructs the node graph.

Double-check the model nodes:

  • Load Diffusion Model: FLUX.2-dev-bnb-4bit
  • Flux2TextEncoder (or CLIPLoader): the Mistral text encoder bundled with FLUX.2-dev
  • Load VAE: ae.safetensors from the FLUX.2-dev repo

Type a prompt in the text node ("a cat holding a sign that says hello world"), click Queue Prompt, and wait. On an RTX 4090 with the BNB-4bit build at 28 steps the first image lands in roughly 30 seconds. The second and subsequent images skip the load phase and finish in 20 to 25 seconds.

Step 5: Call the AI Image Generator From the HTTP API

The ComfyUI server exposes a REST API on the same port as the web UI. The key routes, per the official ComfyUI server routes docs, are:

  • POST /prompt: queue a workflow, returns a prompt_id.
  • GET /history/{prompt_id}: check status and fetch output metadata.
  • GET /view: stream the generated image file.
  • WS /ws: real-time progress events.

First, export your workflow as API JSON. In ComfyUI, open Settings, toggle "Enable Dev mode Options", then click "Save (API Format)" to export workflow_api.json.

Then queue it with curl:

curl -X POST http://localhost:8188/prompt \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": '"$(cat workflow_api.json)"',
    "client_id": "sb-demo-001"
  }'

The response returns a prompt_id. Poll for the result:

PROMPT_ID="paste-the-id-here"
curl http://localhost:8188/history/$PROMPT_ID

When the status object shows "completed", dig into the outputs field for the filename and subfolder. Fetch the image:

curl "http://localhost:8188/view?filename=ComfyUI_00001_.png&subfolder=&type=output" \
  --output first-image.png

That is a complete roundtrip: text prompt in, PNG out, zero third-party dependencies beyond ComfyUI itself.

Python Client for the ComfyUI API

For real integrations, skip curl and use the websocket client. Here is the minimum viable Python client:

import json
import urllib.request
import uuid

SERVER = "http://localhost:8188"
CLIENT_ID = str(uuid.uuid4())

with open("workflow_api.json") as f:
    workflow = json.load(f)

# Edit the prompt text in place
workflow["6"]["inputs"]["text"] = "a cat holding a sign that says hello world"

payload = {"prompt": workflow, "client_id": CLIENT_ID}
req = urllib.request.Request(
    f"{SERVER}/prompt",
    data=json.dumps(payload).encode(),
    headers={"Content-Type": "application/json"},
)
resp = json.loads(urllib.request.urlopen(req).read())
print("Queued prompt:", resp["prompt_id"])

Add a websocket listener on /ws to get progress events. The full working example lives in the ComfyUI repo under script_examples/basic_api_example.py.

Troubleshooting: VRAM OOM and Other Common Failures

The three failures you will hit first when you self-host an AI image generator:

CUDA out of memory on model load. Add --lowvram to main.py. If that still fails, switch from FLUX.2-dev to the smaller FLUX.2-klein-4B build (13GB VRAM) or to SD 3.5 Large at GGUF Q8.

Text encoder not found. FLUX.2 ships with a Mistral-based text encoder, different from the T5-XXL pipeline used in FLUX.1. If ComfyUI complains at queue time, re-download the text_encoder folder from the FLUX.2-dev repo and point the loader node at it.

Black images on output. Usually a VAE mismatch. Confirm you loaded ae.safetensors from the FLUX.2-dev repo, not a FLUX.1 or SDXL VAE. The FLUX.2 VAE is specific and not interchangeable with older Flux weights.

Next Steps: From Demo to Production

You now have a working self-hosted AI image generator. Three places to go next:

  • Wrap the API in a FastAPI service so your app talks to one endpoint instead of raw ComfyUI routes.
  • Add a queue (Redis, RabbitMQ) in front of /prompt so multiple users don't stomp on each other.
  • Swap in FLUX.2-klein-4B (Apache 2.0) or SD 3.5 Large (Stability Community, free under 1M USD revenue) for commercial deployments where the FLUX.2-dev non-commercial license is a problem.

Clone ComfyUI, pull FLUX.2-dev, and render your first prompt before your coffee goes cold.

Prev Article
Best Self-Hosted AI Image Generator in 2026
Next Article
AI Video Generator Comparison 2026: Open Source Models Tested

Related to this topic: