Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

n8n - AI Agent

Run a Full Local AI Stack on Your Own Hardware

Run Ollama, n8n, Supabase, Qdrant, Neo4j, Open WebUI, and Langfuse on your own box with one Docker Compose command. No OpenAI key required.

License Apache 2.0
TL;DR
  • Apache 2.0 fork of n8n's self-hosted AI starter kit. Ten services in one Docker Compose: Ollama, n8n, Open WebUI, Supabase, Qdrant, Neo4j, Flowise, SearXNG, Langfuse, Caddy.
  • Runs on NVIDIA, AMD ROCm, Apple Silicon (host Ollama), or plain CPU. Tested fine on a cheap CPU-only VPS with no GPU at all.
  • Ships prebuilt n8n RAG and agent workflows. Bring your own model and your own data; nothing leaves your box.
System Requirements
RAM8GB
GPUNVIDIA, AMD ROCm, or CPU
VRAM6GB+ for 7B
✓ Ollama ✓ Apple Silicon

You do not need an OpenAI key to run a real AI agent. local-ai-packaged is an Apache 2.0 fork of the official n8n self-hosted AI starter kit, expanded by Cole Medin into a ten-service Docker Compose stack that runs local models, vector search, knowledge graphs, automations, and full LLM observability on your own machine. One clone, one env file, one Python command, and you have the same building blocks the cloud vendors charge by the token for.

What local-ai-packaged Actually Is

The original n8n self-hosted AI starter kit shipped n8n, Ollama, Postgres, and Qdrant in a Compose file. Cole Medin's fork keeps that core and adds everything you actually need to run a private agent in production: Supabase for data and auth, Open WebUI for chat, Flowise for no-code flows, Neo4j for knowledge graphs, Langfuse for tracing, SearXNG for private search, and Caddy as the reverse proxy with automatic HTTPS. It also ships prebuilt n8n workflows in n8n/backup/workflows/ and an n8n_pipe.py function that lets Open WebUI hand a chat off to an n8n agent.

Everything is open source. Nothing phones home. The whole project is licensed Apache 2.0 and lives at github.com/coleam00/local-ai-packaged.

Here is Cole walking through the stack himself in the project announcement video:

Cole Medin walks through local-ai-packaged
Watch: local-ai-packaged announcement
by Cole Medin, the project creator · YouTube

What Is in the Box

Ten services, each one solving a real piece of the agent puzzle:

ServiceWhat it doesDefault port
OllamaLocal LLM runtime. Runs Llama 3.1, Qwen 2.5, DeepSeek, Mistral, anything on the Ollama hub.11434
n8nWorkflow engine with 400+ nodes. The glue that turns models into agents and automations.5678
Open WebUIChatGPT-style chat interface. Talks to Ollama directly or hands prompts off to n8n.3000
FlowiseNo-code visual builder for LangChain agents. The friendly path for non-developers.3001
SupabasePostgres plus auth, storage, and a REST API. Where your agent state and user data live.8000
QdrantVector database for RAG. Stores embeddings of your documents for semantic search.6333
Neo4jGraph database. Powers GraphRAG, where relationships matter as much as content.7474, 7687
SearXNGPrivacy-first metasearch. Aggregates 200+ search engines without tracking you.8080
LangfuseLLM observability. Traces every prompt, response, and tool call so you can debug agents.3210
CaddyReverse proxy with automatic HTTPS via Let's Encrypt. The public face of the stack.80, 443

The combination matters. A model alone is a parrot. A model plus a vector store gives you RAG (retrieval-augmented generation, where the model looks up real text before answering). Add a graph database and you have GraphRAG. Wrap it all in n8n and you have an agent that can call tools and chain steps. Trace it in Langfuse and you can ship to production with confidence. Most stacks give you one or two of these. This one gives you all five out of the box.

What You Will Need

local-ai-packaged is happy on modest hardware. Here is the honest minimum:

  • A host machine running Linux, macOS, or Windows with WSL2. Linux is the smoothest path. Any cheap VPS with KVM or LXC virtualization works.
  • Docker and Docker Compose v2. The official get.docker.com script installs both on Debian and Ubuntu in one line.
  • Python 3.11 or newer. The start_services.py bootstrap script handles the Compose orchestration for you.
  • RAM: 8 GB is the floor for the supporting services alone. 16 GB lets you run a 7B or 8B model alongside everything else. 32 GB is comfortable for 13B and 14B models.
  • Disk: 25 GB free for the images and base data, plus whatever your model files weigh (a quantized Llama 3.1 8B is about 5 GB).
  • Optional GPU: NVIDIA on Linux or Windows is the easy path. AMD on Linux works through ROCm. Apple Silicon cannot expose the Metal GPU to Docker, so on a Mac you run Ollama natively and point the stack at it.

Install With One Command

The whole bootstrap is five steps. Allow about ten minutes the first time, mostly waiting for image pulls.

1. Install Docker and Python

On a fresh Debian or Ubuntu box:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
sudo apt install -y python3 python3-pip git
newgrp docker

2. Clone the repo

git clone -b stable https://github.com/coleam00/local-ai-packaged.git
cd local-ai-packaged

The stable branch is the tested release. The main branch is the development tip and breaks more often.

3. Generate your secrets

Copy the example env file and fill in the long list of secrets it expects:

cp .env.example .env
nano .env

You need to populate at least these keys before the first start:

  • N8N_ENCRYPTION_KEY, N8N_USER_MANAGEMENT_JWT_SECRET
  • POSTGRES_PASSWORD, JWT_SECRET, ANON_KEY, SERVICE_ROLE_KEY
  • DASHBOARD_USERNAME, DASHBOARD_PASSWORD
  • NEO4J_AUTH (format neo4j/your-password)
  • CLICKHOUSE_PASSWORD, MINIO_ROOT_PASSWORD
  • LANGFUSE_SALT, NEXTAUTH_SECRET, ENCRYPTION_KEY

Generate each random value with:

openssl rand -hex 32

One critical gotcha: your Postgres password cannot contain the @ character. The Supabase pooler treats it as a URL separator and the connection silently breaks.

4. Pick your profile and start

The start_services.py bootstrap takes a --profile flag that decides how Ollama runs:

# NVIDIA GPU on Linux or Windows
python3 start_services.py --profile gpu-nvidia

# AMD GPU on Linux
python3 start_services.py --profile gpu-amd

# CPU only (works anywhere, just slower)
python3 start_services.py --profile cpu

# Bring your own Ollama (e.g. native on macOS)
python3 start_services.py --profile none

The first run pulls all the images and creates the volumes. Subsequent starts take seconds.

5. Open the services

Once the script reports everything healthy, browse to:

  • n8n: http://localhost:5678
  • Open WebUI: http://localhost:3000
  • Supabase Studio: http://localhost:8000
  • Flowise: http://localhost:3001
  • Neo4j Browser: http://localhost:7474
  • Langfuse: http://localhost:3210
  • SearXNG: http://localhost:8080

Create an n8n owner account at the first 5678 visit. The same goes for Open WebUI: the first user becomes the admin.

Wire It Together: Your First Local RAG Agent

The repo ships prebuilt workflows in n8n/backup/workflows/. Import the RAG workflow into n8n and you get a functioning agent that does this on every chat:

  1. Open WebUI sends the user message to n8n through the n8n_pipe.py function.
  2. n8n embeds the question with a local Ollama embedding model.
  3. It queries Qdrant for the top matching document chunks. This is the "retrieval" step in retrieval-augmented generation.
  4. It feeds the chunks plus the question to Llama 3.1 running in Ollama, and gets a grounded answer back.
  5. The full trace, prompt, response, latency, and token counts get logged to Langfuse for replay.

To swap the model, pull a new one in the Ollama container:

docker exec -it ollama ollama pull qwen2.5:14b

Then change the model name in the n8n workflow node. To switch from vector RAG to GraphRAG, repoint the retrieval step at Neo4j and replace the chunk query with a Cypher query that walks the entity graph. Same workflow, different store, often dramatically better answers on document sets where relationships matter (legal contracts, org charts, codebases).

Hardware Acceleration: Pick Your Profile

The profile decides where Ollama does its work. Pick the one that matches your box:

ProfileBest forWhat gets acceleratedCaveat
gpu-nvidiaAny box with an NVIDIA cardOllama uses CUDA for inferenceNeeds the nvidia-container-toolkit installed on the host
gpu-amdLinux box with a recent RadeonOllama uses ROCmLinux only, ROCm support is uneven across cards
cpuAny machine, including a cheap VPSNothing, plain CPU inferenceSlower, but works everywhere. 7B and 8B models are usable on a modern CPU.
nonemacOS or any host where you already run OllamaWhatever your local Ollama uses (Metal on Apple Silicon)You point the stack at host.docker.internal:11434 instead of the bundled container

The CPU profile is the surprise hero. We tested this on a Hetzner dedicated root server inside a CPU-only LXC container, no GPU at all, and Llama 3.1 8B answered RAG queries in five to fifteen seconds depending on context length. That is slow compared to a GPU, but it is fast enough for a personal knowledge base, a small team, or a side project. The bigger your CPU, the faster it goes. Nothing else in the stack changes.

Tips and Gotchas

Apple Silicon needs native Ollama. Docker on macOS cannot see the Metal GPU. Install Ollama from the official installer, run it on the host, then start the stack with --profile none and point n8n at http://host.docker.internal:11434.

n8n v2 disables risky nodes by default. The Local File Trigger and Execute Command nodes are in the deny list. Uncomment the NODES_EXCLUDE=[] line in the compose file if you actually want them, and understand the security trade-off before you do.

Supabase storage needs stub S3 settings. As of February 2026, the storage container fails to start without GLOBAL_S3_BUCKET=stub, REGION=stub, STORAGE_TENANT_ID=stub, plus a fake S3 access key and secret in your .env. The "stub" values are intentional. The storage service just refuses to boot without them set.

SearXNG needs a chmod. Before the first start, run chmod 755 searxng from the repo root or the container will refuse to write its config.

Public deployment goes through Caddy only. When you expose this stack to the internet, run python3 start_services.py --environment public, point your DNS A records at the box, and let Caddy handle ports 80 and 443. Do not publish any other port. UFW does not block Docker-published ports, so any exposed container port is reachable from outside even with the firewall on.

Container upgrades are a three-step dance.

docker compose -p localai down
docker compose -p localai pull
python3 start_services.py --profile gpu-nvidia

Postgres analytics can corrupt itself. If Langfuse or Supabase analytics refuses to start after a crash, delete supabase/docker/volumes/db/data and let Postgres reinitialize. You lose Supabase data, so back it up first.

What You Can Build With This

The stack is a building block. Four projects you can wire up in a weekend:

A private RAG over your own documents. Drop PDFs, Markdown notes, or Notion exports into a watched folder. n8n picks them up, splits them into chunks, embeds them with a local Ollama embedding model, and stores the vectors in Qdrant. Open WebUI then lets you chat with the entire archive. Nothing leaves your box. This replaces a paid ChatGPT Enterprise subscription for a single developer or a small team.

A meeting-notes agent. Run Whisper locally to transcribe a meeting recording, hand the transcript to Llama 3.1 for a summary, extract action items into a Neo4j graph that links people to tasks, and email everyone their personal action list. The whole pipeline is one n8n workflow. Langfuse traces every step so you can debug the prompts when the summary misses something.

A privacy-first research assistant. SearXNG fetches results from 200+ search engines without tracking. n8n hands the top hits to a local model that reads, condenses, and cites them. You get a research bot that never leaks your queries to Google or Bing, and Langfuse keeps a replayable history of every search session.

A no-code agent in Flowise. Non-developers on the team can drag and drop their own LangChain agents in Flowise, and they all share the same Ollama backend. One model, one box, no per-token bill, every member of the team running their own private GPT.

Sources and Further Reading

You now own a complete local AI stack. Clone the repo, fill the env file, pick a profile, run one command, and you have models, vector search, graphs, agents, and observability running on your own hardware. Start with one prebuilt workflow and a small Llama 3.1 model. Once one agent works end to end, the rest is copy and paste. The OpenAI bill can wait.

Tested On

We ran this stack on a small CPU-only LXC container on a Hetzner dedicated root server, with no GPU at all. Everything came up clean on the first try once the env file was filled, and Llama 3.1 8B answered RAG queries in about ten seconds end to end. Bigger boxes will be faster. Nothing else changes.

Tested on: Hetzner dedicated root server, CPU-only LXC container, no GPU
Profile:   --profile cpu
Model:     Llama 3.1 8B via Ollama
Date:      2026-04-11

 

Prev Article
Run Frigate Locally: Open-Source NVR With AI Object Detection
Next Article
Install the fal.ai Skill for Claude Code in One Prompt

Related to this topic: