local-ai-packaged: One-Command Self-Hosted AI Stack with Ollama, n8n, Supabase, Qdrant, and Neo4j

Part of a series

Local AI Stack

License Apache 2.0

TL;DR

Apache 2.0 fork of n8n's self-hosted AI starter kit. Ten services in one Docker Compose: Ollama, n8n, Open WebUI, Supabase, Qdrant, Neo4j, Flowise, SearXNG, Langfuse, Caddy.
Runs on NVIDIA, AMD ROCm, Apple Silicon (host Ollama), or plain CPU. Tested fine on a cheap CPU-only VPS with no GPU at all.
Ships prebuilt n8n RAG and agent workflows. Bring your own model and your own data; nothing leaves your box.

System Requirements

RAM	8GB
GPU	NVIDIA, AMD ROCm, or CPU
VRAM	6GB+ for 7B

✓ Ollama ✓ Apple Silicon

Table of Contents

You do not need an OpenAI key to run a real AI agent. local-ai-packaged is an Apache 2.0 fork of the official n8n self-hosted AI starter kit, expanded by Cole Medin into a ten-service Docker Compose stack that runs local models, vector search, knowledge graphs, automations, and full LLM observability on your own machine. One clone, one env file, one Python command, and you have the same building blocks the cloud vendors charge by the token for.

What local-ai-packaged Actually Is

The original n8n self-hosted AI starter kit shipped n8n, Ollama, Postgres, and Qdrant in a Compose file. Cole Medin's fork keeps that core and adds everything you actually need to run a private agent in production: Supabase for data and auth, Open WebUI for chat, Flowise for no-code flows, Neo4j for knowledge graphs, Langfuse for tracing, SearXNG for private search, and Caddy as the reverse proxy with automatic HTTPS. It also ships prebuilt n8n workflows in n8n/backup/workflows/ and an n8n_pipe.py function that lets Open WebUI hand a chat off to an n8n agent.

Everything is open source. Nothing phones home. The whole project is licensed Apache 2.0 and lives at github.com/coleam00/local-ai-packaged.

Here is Cole walking through the stack himself in the project announcement video:

Cole Medin walks through local-ai-packaged

Watch: local-ai-packaged announcement

by Cole Medin, the project creator · YouTube

What Is in the Box

Ten services, each one solving a real piece of the agent puzzle:

Service	What it does	Default port
Ollama	Local LLM runtime. Runs Llama 3.1, Qwen 2.5, DeepSeek, Mistral, anything on the Ollama hub.	11434
n8n	Workflow engine with 400+ nodes. The glue that turns models into agents and automations.	5678
Open WebUI	ChatGPT-style chat interface. Talks to Ollama directly or hands prompts off to n8n.	3000
Flowise	No-code visual builder for LangChain agents. The friendly path for non-developers.	3001
Supabase	Postgres plus auth, storage, and a REST API. Where your agent state and user data live.	8000
Qdrant	Vector database for RAG. Stores embeddings of your documents for semantic search.	6333
Neo4j	Graph database. Powers GraphRAG, where relationships matter as much as content.	7474, 7687
SearXNG	Privacy-first metasearch. Aggregates 200+ search engines without tracking you.	8080
Langfuse	LLM observability. Traces every prompt, response, and tool call so you can debug agents.	3210
Caddy	Reverse proxy with automatic HTTPS via Let's Encrypt. The public face of the stack.	80, 443

The combination matters. A model alone is a parrot. A model plus a vector store gives you RAG (retrieval-augmented generation, where the model looks up real text before answering). Add a graph database and you have GraphRAG. Wrap it all in n8n and you have an agent that can call tools and chain steps. Trace it in Langfuse and you can ship to production with confidence. Most stacks give you one or two of these. This one gives you all five out of the box.

What You Will Need

local-ai-packaged is happy on modest hardware. Here is the honest minimum:

A host machine running Linux, macOS, or Windows with WSL2. Linux is the smoothest path. Any cheap VPS with KVM or LXC virtualization works.
Docker and Docker Compose v2. The official get.docker.com script installs both on Debian and Ubuntu in one line.
Python 3.11 or newer. The start_services.py bootstrap script handles the Compose orchestration for you.
RAM: 8 GB is the floor for the supporting services alone. 16 GB lets you run a 7B or 8B model alongside everything else. 32 GB is comfortable for 13B and 14B models.
Disk: 25 GB free for the images and base data, plus whatever your model files weigh (a quantized Llama 3.1 8B is about 5 GB).
Optional GPU: NVIDIA on Linux or Windows is the easy path. AMD on Linux works through ROCm. Apple Silicon cannot expose the Metal GPU to Docker, so on a Mac you run Ollama natively and point the stack at it.

Install With One Command

The whole bootstrap is five steps. Allow about ten minutes the first time, mostly waiting for image pulls.

1. Install Docker and Python

On a fresh Debian or Ubuntu box:

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
sudo apt install -y python3 python3-pip git
newgrp docker

2. Clone the repo

git clone -b stable https://github.com/coleam00/local-ai-packaged.git
cd local-ai-packaged

The stable branch is the tested release. The main branch is the development tip and breaks more often.

3. Generate your secrets

Copy the example env file and fill in the long list of secrets it expects:

cp .env.example .env
nano .env

You need to populate at least these keys before the first start:

N8N_ENCRYPTION_KEY, N8N_USER_MANAGEMENT_JWT_SECRET
POSTGRES_PASSWORD, JWT_SECRET, ANON_KEY, SERVICE_ROLE_KEY
DASHBOARD_USERNAME, DASHBOARD_PASSWORD
NEO4J_AUTH (format neo4j/your-password)
CLICKHOUSE_PASSWORD, MINIO_ROOT_PASSWORD
LANGFUSE_SALT, NEXTAUTH_SECRET, ENCRYPTION_KEY

Generate each random value with:

openssl rand -hex 32

One critical gotcha: your Postgres password cannot contain the @ character. The Supabase pooler treats it as a URL separator and the connection silently breaks.

4. Pick your profile and start

The start_services.py bootstrap takes a --profile flag that decides how Ollama runs:

# NVIDIA GPU on Linux or Windows
python3 start_services.py --profile gpu-nvidia

# AMD GPU on Linux
python3 start_services.py --profile gpu-amd

# CPU only (works anywhere, just slower)
python3 start_services.py --profile cpu

# Bring your own Ollama (e.g. native on macOS)
python3 start_services.py --profile none

The first run pulls all the images and creates the volumes. Subsequent starts take seconds.

5. Open the services

Once the script reports everything healthy, browse to:

n8n: http://localhost:5678
Open WebUI: http://localhost:3000
Supabase Studio: http://localhost:8000
Flowise: http://localhost:3001
Neo4j Browser: http://localhost:7474
Langfuse: http://localhost:3210
SearXNG: http://localhost:8080

Create an n8n owner account at the first 5678 visit. The same goes for Open WebUI: the first user becomes the admin.

Wire It Together: Your First Local RAG Agent

The repo ships prebuilt workflows in n8n/backup/workflows/. Import the RAG workflow into n8n and you get a functioning agent that does this on every chat:

Open WebUI sends the user message to n8n through the n8n_pipe.py function.
n8n embeds the question with a local Ollama embedding model.
It queries Qdrant for the top matching document chunks. This is the "retrieval" step in retrieval-augmented generation.
It feeds the chunks plus the question to Llama 3.1 running in Ollama, and gets a grounded answer back.
The full trace, prompt, response, latency, and token counts get logged to Langfuse for replay.

To swap the model, pull a new one in the Ollama container:

docker exec -it ollama ollama pull qwen2.5:14b

Then change the model name in the n8n workflow node. To switch from vector RAG to GraphRAG, repoint the retrieval step at Neo4j and replace the chunk query with a Cypher query that walks the entity graph. Same workflow, different store, often dramatically better answers on document sets where relationships matter (legal contracts, org charts, codebases).

Hardware Acceleration: Pick Your Profile

The profile decides where Ollama does its work. Pick the one that matches your box:

Profile	Best for	What gets accelerated	Caveat
`gpu-nvidia`	Any box with an NVIDIA card	Ollama uses CUDA for inference	Needs the nvidia-container-toolkit installed on the host
`gpu-amd`	Linux box with a recent Radeon	Ollama uses ROCm	Linux only, ROCm support is uneven across cards
`cpu`	Any machine, including a cheap VPS	Nothing, plain CPU inference	Slower, but works everywhere. 7B and 8B models are usable on a modern CPU.
`none`	macOS or any host where you already run Ollama	Whatever your local Ollama uses (Metal on Apple Silicon)	You point the stack at `host.docker.internal:11434` instead of the bundled container

The CPU profile is the surprise hero. We tested this on a Hetzner dedicated root server inside a CPU-only LXC container, no GPU at all, and Llama 3.1 8B answered RAG queries in five to fifteen seconds depending on context length. That is slow compared to a GPU, but it is fast enough for a personal knowledge base, a small team, or a side project. The bigger your CPU, the faster it goes. Nothing else in the stack changes.

Tips and Gotchas

Apple Silicon needs native Ollama. Docker on macOS cannot see the Metal GPU. Install Ollama from the official installer, run it on the host, then start the stack with --profile none and point n8n at http://host.docker.internal:11434.

n8n v2 disables risky nodes by default. The Local File Trigger and Execute Command nodes are in the deny list. Uncomment the NODES_EXCLUDE=[] line in the compose file if you actually want them, and understand the security trade-off before you do.

Supabase storage needs stub S3 settings. As of February 2026, the storage container fails to start without GLOBAL_S3_BUCKET=stub, REGION=stub, STORAGE_TENANT_ID=stub, plus a fake S3 access key and secret in your .env. The "stub" values are intentional. The storage service just refuses to boot without them set.

SearXNG needs a chmod. Before the first start, run chmod 755 searxng from the repo root or the container will refuse to write its config.

Public deployment goes through Caddy only. When you expose this stack to the internet, run python3 start_services.py --environment public, point your DNS A records at the box, and let Caddy handle ports 80 and 443. Do not publish any other port. UFW does not block Docker-published ports, so any exposed container port is reachable from outside even with the firewall on.

Container upgrades are a three-step dance.

docker compose -p localai down
docker compose -p localai pull
python3 start_services.py --profile gpu-nvidia

Postgres analytics can corrupt itself. If Langfuse or Supabase analytics refuses to start after a crash, delete supabase/docker/volumes/db/data and let Postgres reinitialize. You lose Supabase data, so back it up first.

What You Can Build With This

The stack is a building block. Four projects you can wire up in a weekend:

A private RAG over your own documents. Drop PDFs, Markdown notes, or Notion exports into a watched folder. n8n picks them up, splits them into chunks, embeds them with a local Ollama embedding model, and stores the vectors in Qdrant. Open WebUI then lets you chat with the entire archive. Nothing leaves your box. This replaces a paid ChatGPT Enterprise subscription for a single developer or a small team.

A meeting-notes agent. Run Whisper locally to transcribe a meeting recording, hand the transcript to Llama 3.1 for a summary, extract action items into a Neo4j graph that links people to tasks, and email everyone their personal action list. The whole pipeline is one n8n workflow. Langfuse traces every step so you can debug the prompts when the summary misses something.

A privacy-first research assistant. SearXNG fetches results from 200+ search engines without tracking. n8n hands the top hits to a local model that reads, condenses, and cites them. You get a research bot that never leaks your queries to Google or Bing, and Langfuse keeps a replayable history of every search session.

A no-code agent in Flowise. Non-developers on the team can drag and drop their own LangChain agents in Flowise, and they all share the same Ollama backend. One model, one box, no per-token bill, every member of the team running their own private GPT.

Sources and Further Reading

You now own a complete local AI stack. Clone the repo, fill the env file, pick a profile, run one command, and you have models, vector search, graphs, agents, and observability running on your own hardware. Start with one prebuilt workflow and a small Llama 3.1 model. Once one agent works end to end, the rest is copy and paste. The OpenAI bill can wait.

Tested On

We ran this stack on a small CPU-only LXC container on a Hetzner dedicated root server, with no GPU at all. Everything came up clean on the first try once the env file was filled, and Llama 3.1 8B answered RAG queries in about ten seconds end to end. Bigger boxes will be faster. Nothing else changes.

Tested on: Hetzner dedicated root server, CPU-only LXC container, no GPU
Profile:   --profile cpu
Model:     Llama 3.1 8B via Ollama
Date:      2026-04-11

Subscribe to the Newsletter

Search

GDPR Compliance

Log in

Create an account

Reset password

Terms of use

Information Collected by SingularityByte.com

How We Use This Information

Information Disclosure

Cookies, Trackers, and Online Ads

Other Sites

Information Security

Do-Not-Track

Additional Options

Microsoft Clarity

Contact Us

Midjourney SREF Styles:

Local AI Stack

What local-ai-packaged Actually Is

What Is in the Box

What You Will Need

Install With One Command

1. Install Docker and Python

2. Clone the repo

3. Generate your secrets

4. Pick your profile and start

5. Open the services

Wire It Together: Your First Local RAG Agent

Hardware Acceleration: Pick Your Profile

Tips and Gotchas

What You Can Build With This

Sources and Further Reading

Tested On

Run Frigate Locally: Open-Source NVR With AI Object Detection

Install the fal.ai Skill for Claude Code in One Prompt

Related to this topic:

Latest topics

The Sections

About

Keep up to date with the latest updates & news