TL;DR
- Microcontrollers (ESP32-class) run kB-MB TinyML classifiers, not transformer language models
- The smallest real SLMs (270M-3B) need a phone, Raspberry Pi, or Jetson, not a sensor node
- NPUs like the Hailo-8L accelerate vision, not autoregressive LLM decode
The pitch is everywhere: small language models are now small enough to run on a sensor node at the edge. FunctionGemma 270M, Gemma 3 1B, and Ministral-3 3B all ship with "edge" and "on-device" front and center. So we went looking for the honest version of that claim on real constrained hardware, an ESP32-C5 microcontroller and a Raspberry Pi 5 with a Hailo-8L accelerator. The short answer: most of the "AI on a microcontroller" marketing quietly means a phone or a Raspberry Pi, and a true sensor node runs a different kind of AI entirely.
Two devices, one word: "edge"
The confusion starts with one overloaded word. A microcontroller and a single-board computer both get called "edge," but they are three orders of magnitude apart in memory. That gap decides everything.
| Class | Example | RAM | Accelerator | What it actually runs |
| Microcontroller | ESP32-C5 | 384 KB SRAM (+ up to 32 MB slow PSRAM) | None (no FPU, no NPU) | kB-MB TinyML classifiers |
| Microcontroller + NPU | Grove Vision AI V2 (WiseEye2 HX6538) | On-chip SRAM, ~2 MB models | Arm Ethos-U55 NPU | On-device vision: detection, faces, gestures |
| Single-board computer | Raspberry Pi 5 (8 GB) | 8 GB | Optional Hailo-8L (vision) | 270M-3B LLMs on CPU; vision on NPU |
| Edge AI box | NVIDIA Jetson Orin | 8-64 GB | GPU + DLA | 3B-13B LLMs, real-time vision |
When a vendor says "edge," check which row they mean. As you will see, all three "tiny" models live in row two or three. None of them fit in row one.
The tiny models, and where they really fit
Here are the three models the marketing points at sensor nodes, with the number that matters: how much memory you need to load them at 4-bit, and the smallest device anyone has actually demonstrated them on.
| Model | Params | Q4 size | Min RAM to run | Smallest real device | License |
| FunctionGemma 270M | 270M | ~253 MB | ~550 MB | Phone / Raspberry Pi 5 | Gemma terms |
| Gemma 3 1B | 1B | ~529 MB | ~1.5 GB | Raspberry Pi 4/5 | Gemma terms |
| Ministral-3 3B | 3.4B (+0.4B vision) | ~2.15 GB | ~3-4 GB | Jetson / RTX / Pi 5 (slow) | Apache 2.0 |
All sizes are community GGUF figures; the RAM floors are vendor or community-reported. Three things jump out.
First, read the licenses. Ministral-3 is genuine Apache 2.0, so you can ship it commercially without asking anyone. Both Gemma models use Google's Gemma terms, which permit commercial use but carry a prohibited-use policy and flow-down obligations. "Open weights" is not the same as "open source" here.
Second, watch the RAM sleight of hand. Mistral notes the 3B needs "at least 2 GB of RAM," which is the weights alone at 4-bit. Add the OS, the application, and a KV cache for a long context and that 2 GB board has nothing left. The practical floor is 3-4 GB, which is a Raspberry Pi, not a sensor.
Third, small does not mean smart out of the box. FunctionGemma 270M is a function-calling specialist, but its base accuracy on the live, multi-turn slice of the Berkeley Function-Calling Leaderboard sits around 36 percent (vendor-reported). It only becomes useful after task-specific fine-tuning, where community runs push narrow tasks above 90 percent. The 270M is a foundation to fine-tune, not a drop-in agent.
What an ESP32 actually runs
A microcontroller is not a small computer. The ESP32-C5 has 384 KB of on-chip SRAM, no floating-point unit, and no neural accelerator (per Espressif's datasheet). You can bolt on up to 32 MB of external PSRAM, but it runs over an 80 MHz SPI bus, far too slow for the hot path of transformer inference. What thrives here is TinyML: tiny quantized convolutional nets and classic models via TensorFlow Lite Micro or ESP-DL.
| Task | Model | Size | Peak RAM | Latency | Accuracy |
| Acoustic anomaly detection | 686-param autoencoder | 2.7 KB | ~80 KB | 4 ms | 99.3% |
| Keyword spotting | DS-CNN INT8 | 52-100 KB | 15-32 KB | 9-20 ms (inference) | ~88% |
| RF spectrum classification | CNN on spectrogram | ~250 KB | ~12 KB | 159 ms | >90% (high SNR) |
All community-reported. These are real, useful, and they run in kilobytes. Now compare that to a "small" language model. The smallest credible one, Gemma 3 270M at Q4, is a 253 MB file. That is larger than the maximum 32 MB of PSRAM you can attach to an ESP32-C5, before you account for the runtime or the KV cache. The math simply does not close.
The closest anyone has come is a 260K-parameter llama2.c toy transformer running on an ESP32-S3 with 8 MB of PSRAM at about 19 tokens per second (community-reported). The author's own verdict: "probably not very useful." It generates story fragments. It cannot follow an instruction or classify a signal. So the honest line is: microcontrollers do TinyML classification, not language models.
Raspberry Pi 5, with and without Hailo
Step up to a Raspberry Pi 5 and the small models finally run, on the CPU, via llama.cpp.
| Model size | Quant | Generation speed | Notes |
| Sub-360M | Q4 | >20 tok/s | Usable for narrow tasks |
| 1-1.5B | Q4 | 5-15 tok/s | Conversational, sluggish |
| ~3B | Q4 | 2-5 tok/s | Patience required |
Community-reported, roughly 10 W under load. Workable for a local assistant or a structured-output task, not for anything latency-sensitive.
Now the part people get wrong. Bolt a Hailo-8L (the 13 TOPS Raspberry Pi AI Kit) onto that Pi and it screams on vision: YOLOv8s lands around 120 FPS and YOLOv6n past 350 FPS (community-reported, batch 8). But it will not speed up your LLM. Hailo's own staff state plainly that the 8L cannot run autoregressive LLM decode: it has no on-board DRAM, so an LLM would force 100-plus context switches through the host. LLM acceleration is what the newer Hailo-10H exists for. The one nuance: a Whisper speech model can run its encoder (a fixed-size, CNN-shaped network) on the 8L for an ~8x speedup, while the autoregressive decoder still runs on the CPU. NPUs accelerate vision and fixed-shape encoders, not token-by-token generation.
If memory is the wall you keep hitting, that is also where techniques like KV-cache quantization come in; we covered one such method in our Google TurboQuant writeup.
The actual jobs: Remote ID and signal triage
Take the two use cases people raise for "edge AI on a node": drone Remote ID and RF signal triage. Neither needs a language model, and one of them barely needs ML at all.
Drone Remote ID (the ASTM F3411 standard) is broadcast structured data. Drones announce their ID, position, and operator location in WiFi beacon frames and BLE advertisements. Receiving it is promiscuous sniffing plus protocol parsing, and matching a drone ID is a registry lookup, not inference. An ESP32 handles the whole job: the opendroneid-core-c library runs on ESP32-C3 and S3 class chips. ML only enters if you want to fingerprint drones that are not broadcasting Remote ID at all, by their raw RF signature, which is a genuinely harder, separate problem.
RF signal triage is where TinyML earns its place on the node. A small CNN over a simplified spectrogram can classify signal types on a 64 MHz microcontroller in about 159 ms at over 90 percent accuracy at high SNR (community-reported). That is real on-device intelligence. But it is narrow: wideband sweeps, open-set classification, and anything needing context get handed to a backend. The node triages; the hard analysis travels.
The pattern that works: classify at the edge, relay the verdict
That last line is the whole design principle. You do not run the brain on the sensor. You run a reflex, and you send the conclusion, which is a few bytes, not the computation.
A clean example is the open-source meshtastic-aicamera project, and it starts with a board worth knowing: the Seeed Studio Grove Vision AI Module V2. For about $25 it pairs a Himax WiseEye2 HX6538 (dual Arm Cortex-M55 plus an Arm Ethos-U55 NPU) with a camera, and it runs vision models fully on-device, no host computer and no cloud. This is the nuance the device table misses: it is microcontroller-class, but that small NPU is the difference maker. It runs roughly 2 MB quantized models (MobileNet, EfficientNet-lite, YOLOv5 and v8) and detects people, faces, whether someone is wearing a mask, hand gestures, bags, and other objects out of the box. You can also train it on your own images with no code through Seeed's SenseCraft AI and flash the result straight to the board. At $25, it is one of the cheapest honest on-ramps to real edge AI.
meshtastic-aicamera wires that capability into a mesh. The Grove V2 does the detection; a XIAO ESP32-C3 reads the result over I2C (it runs no model itself); and a XIAO ESP32-S3 paired with a Wio-SX1262 radio broadcasts a tiny Meshtastic text alert over LoRa, in the shape @CAM TRIGGERED:<class>@<score>. The vision model runs on the sensor; the LoRa mesh carries a one-line verdict, which is all its bandwidth can afford. No language model anywhere, and none needed.
# meshtastic-aicamera: flash the vision model to the Grove Vision AI V2, then the C3 bridge
make flash-grove # MobileNet/YOLO model onto the WiseEye2 NPU
make flash-c3 # I2C bridge firmware that relays detections to the LoRa node
Scale the same pattern up to a single-board computer and you get our Frigate local NVR. A Raspberry Pi 5 with a Hailo accelerator runs heavier vision pipelines than any sensor node can: object detection plus license-plate detection and on-device OCR that actually reads the plate text, all locally. Only the events, a detection or a decoded plate string, flow downstream, never the raw video. Different hardware tier, identical idea: the heavy model sits where the memory is, and the edge sends the answer. (OCR is still a fixed-shape vision model, which is exactly what an NPU like the Hailo accelerates well, not autoregressive text generation.)
So, does edge AI fit on a sensor node?
Yes, if you mean the right kind of AI. Classification, detection, anomaly spotting, and signal triage fit comfortably on a microcontroller in kilobytes. Language models do not, and no amount of quantization closes a 3-orders-of-magnitude gap. Here is the cheat sheet.
| You want to... | Right hardware | What runs there |
| Spot a keyword, sound, or anomaly | ESP32-class MCU | TinyML CNN (kB) |
| Receive and parse drone Remote ID | ESP32-class MCU | Protocol parsing, no ML |
| Classify RF signals (narrow) | ESP32-class MCU | TinyML spectrogram CNN |
| Detect objects, faces, or gestures on a cheap node | Grove Vision AI V2 ($25, WiseEye2 + Ethos-U55) | ~2 MB vision models on-device |
| Real-time object detection (high FPS) | Pi 5 + Hailo-8L | YOLO on NPU |
| Run a 270M-3B language model | Pi 5 / phone / Jetson | llama.cpp on CPU/GPU |
| Run a reasoning or 7B+ model | Jetson / server | GPU or Hailo-10H |
Build for the row you are actually on. Put the reflex on the sensor, keep the brain where the memory is, and send the verdict over the wire. That is edge AI that ships.
Sources and further reading
Date checked: 2026-06-09