SLMs on Microcontrollers: Does Edge AI Actually Fit on a Sensor Node? (ESP32-C5, Raspberry Pi 5 + Hailo-8L, TinyML)

Part of a series

Microcontroller AI

TL;DR

Microcontrollers (ESP32-class) run kB-MB TinyML classifiers, not transformer language models
The smallest real SLMs (270M-3B) need a phone, Raspberry Pi, or Jetson, not a sensor node
NPUs like the Hailo-8L accelerate vision, not autoregressive LLM decode

Table of Contents

Part one of our microcontroller AI series. In part two, someone fits a 28.9M-parameter language model on an $8 ESP32-S3.

The pitch is everywhere: small language models are now small enough to run on a sensor node at the edge. FunctionGemma 270M, Gemma 3 1B, and Ministral-3 3B all ship with "edge" and "on-device" front and center. So we went looking for the honest version of that claim on real constrained hardware, an ESP32-C5 microcontroller and a Raspberry Pi 5 with a Hailo-8L accelerator. The short answer: most of the "AI on a microcontroller" marketing quietly means a phone or a Raspberry Pi, and a true sensor node runs a different kind of AI entirely.

Two devices, one word: "edge"

The confusion starts with one overloaded word. A microcontroller and a single-board computer both get called "edge," but they are three orders of magnitude apart in memory. That gap decides everything.

Class	Example	RAM	Accelerator	What it actually runs
Microcontroller	ESP32-C5	384 KB SRAM (+ up to 32 MB slow PSRAM)	None (no FPU, no NPU)	kB-MB TinyML classifiers
Microcontroller + NPU	Grove Vision AI V2 (WiseEye2 HX6538)	On-chip SRAM, ~2 MB models	Arm Ethos-U55 NPU	On-device vision: detection, faces, gestures
Single-board computer	Raspberry Pi 5 (8 GB)	8 GB	Optional Hailo-8L (vision)	270M-3B LLMs on CPU; vision on NPU
Edge AI box	NVIDIA Jetson Orin	8-64 GB	GPU + DLA	3B-13B LLMs, real-time vision

When a vendor says "edge," check which row they mean. As you will see, all three "tiny" models live in row two or three. None of them fit in row one.

The tiny models, and where they really fit

Here are the three models the marketing points at sensor nodes, with the number that matters: how much memory you need to load them at 4-bit, and the smallest device anyone has actually demonstrated them on.

Model	Params	Q4 size	Min RAM to run	Smallest real device	License
FunctionGemma 270M	270M	~253 MB	~550 MB	Phone / Raspberry Pi 5	Gemma terms
Gemma 3 1B	1B	~529 MB	~1.5 GB	Raspberry Pi 4/5	Gemma terms
Ministral-3 3B	3.4B (+0.4B vision)	~2.15 GB	~3-4 GB	Jetson / RTX / Pi 5 (slow)	Apache 2.0

All sizes are community GGUF figures; the RAM floors are vendor or community-reported. Three things jump out.

First, read the licenses. Ministral-3 is genuine Apache 2.0, so you can ship it commercially without asking anyone. Both Gemma models use Google's Gemma terms, which permit commercial use but carry a prohibited-use policy and flow-down obligations. "Open weights" is not the same as "open source" here.

Second, watch the RAM sleight of hand. Mistral notes the 3B needs "at least 2 GB of RAM," which is the weights alone at 4-bit. Add the OS, the application, and a KV cache for a long context and that 2 GB board has nothing left. The practical floor is 3-4 GB, which is a Raspberry Pi, not a sensor.

Third, small does not mean smart out of the box. FunctionGemma 270M is a function-calling specialist, but its base accuracy on the live, multi-turn slice of the Berkeley Function-Calling Leaderboard sits around 36 percent (vendor-reported). It only becomes useful after task-specific fine-tuning, where community runs push narrow tasks above 90 percent. The 270M is a foundation to fine-tune, not a drop-in agent.

What an ESP32 actually runs

A microcontroller is not a small computer. The ESP32-C5 has 384 KB of on-chip SRAM, no floating-point unit, and no neural accelerator (per Espressif's datasheet). You can bolt on up to 32 MB of external PSRAM, but it runs over an 80 MHz SPI bus, far too slow for the hot path of transformer inference. What thrives here is TinyML: tiny quantized convolutional nets and classic models via TensorFlow Lite Micro or ESP-DL.

Task	Model	Size	Peak RAM	Latency	Accuracy
Acoustic anomaly detection	686-param autoencoder	2.7 KB	~80 KB	4 ms	99.3%
Keyword spotting	DS-CNN INT8	52-100 KB	15-32 KB	9-20 ms (inference)	~88%
RF spectrum classification	CNN on spectrogram	~250 KB	~12 KB	159 ms	>90% (high SNR)

All community-reported. These are real, useful, and they run in kilobytes. Now compare that to a "small" language model. The smallest credible one, Gemma 3 270M at Q4, is a 253 MB file. That is larger than the maximum 32 MB of PSRAM you can attach to an ESP32-C5, before you account for the runtime or the KV cache. The math simply does not close.

The closest anyone has come is a 260K-parameter llama2.c toy transformer running on an ESP32-S3 with 8 MB of PSRAM at about 19 tokens per second (community-reported). The author's own verdict: "probably not very useful." It generates story fragments. It cannot follow an instruction or classify a signal. So the honest line is: microcontrollers do TinyML classification, not language models. Since we published this, someone pushed that ceiling to a 28.9M-parameter model on the same class of chip, and we break down exactly how in part two of this series.

Raspberry Pi 5, with and without Hailo

Step up to a Raspberry Pi 5 and the small models finally run, on the CPU, via llama.cpp.

Model size	Quant	Generation speed	Notes
Sub-360M	Q4	>20 tok/s	Usable for narrow tasks
1-1.5B	Q4	5-15 tok/s	Conversational, sluggish
~3B	Q4	2-5 tok/s	Patience required

Community-reported, roughly 10 W under load. Workable for a local assistant or a structured-output task, not for anything latency-sensitive.

Now the part people get wrong. Bolt a Hailo-8L (the 13 TOPS Raspberry Pi AI Kit) onto that Pi and it screams on vision: YOLOv8s lands around 120 FPS and YOLOv6n past 350 FPS (community-reported, batch 8). But it will not speed up your LLM. Hailo's own staff state plainly that the 8L cannot run autoregressive LLM decode: it has no on-board DRAM, so an LLM would force 100-plus context switches through the host. LLM acceleration is what the newer Hailo-10H exists for. The one nuance: a Whisper speech model can run its encoder (a fixed-size, CNN-shaped network) on the 8L for an ~8x speedup, while the autoregressive decoder still runs on the CPU. NPUs accelerate vision and fixed-shape encoders, not token-by-token generation.

If memory is the wall you keep hitting, that is also where techniques like KV-cache quantization come in; we covered one such method in our Google TurboQuant writeup.

The actual jobs: Remote ID and signal triage

Take the two use cases people raise for "edge AI on a node": drone Remote ID and RF signal triage. Neither needs a language model, and one of them barely needs ML at all.

Drone Remote ID (the ASTM F3411 standard) is broadcast structured data. Drones announce their ID, position, and operator location in WiFi beacon frames and BLE advertisements. Receiving it is promiscuous sniffing plus protocol parsing, and matching a drone ID is a registry lookup, not inference. An ESP32 handles the whole job: the opendroneid-core-c library runs on ESP32-C3 and S3 class chips. ML only enters if you want to fingerprint drones that are not broadcasting Remote ID at all, by their raw RF signature, which is a genuinely harder, separate problem.

RF signal triage is where TinyML earns its place on the node. A small CNN over a simplified spectrogram can classify signal types on a 64 MHz microcontroller in about 159 ms at over 90 percent accuracy at high SNR (community-reported). That is real on-device intelligence. But it is narrow: wideband sweeps, open-set classification, and anything needing context get handed to a backend. The node triages; the hard analysis travels.

The pattern that works: classify at the edge, relay the verdict

That last line is the whole design principle. You do not run the brain on the sensor. You run a reflex, and you send the conclusion, which is a few bytes, not the computation.

A clean example is the open-source meshtastic-aicamera project, and it starts with a board worth knowing: the Seeed Studio Grove Vision AI Module V2. For about $25 it pairs a Himax WiseEye2 HX6538 (dual Arm Cortex-M55 plus an Arm Ethos-U55 NPU) with a camera, and it runs vision models fully on-device, no host computer and no cloud. This is the nuance the device table misses: it is microcontroller-class, but that small NPU is the difference maker. It runs roughly 2 MB quantized models (MobileNet, EfficientNet-lite, YOLOv5 and v8) and detects people, faces, whether someone is wearing a mask, hand gestures, bags, and other objects out of the box. You can also train it on your own images with no code through Seeed's SenseCraft AI and flash the result straight to the board. At $25, it is one of the cheapest honest on-ramps to real edge AI.

meshtastic-aicamera wires that capability into a mesh. The Grove V2 does the detection; a XIAO ESP32-C3 reads the result over I2C (it runs no model itself); and a XIAO ESP32-S3 paired with a Wio-SX1262 radio broadcasts a tiny Meshtastic text alert over LoRa, in the shape @CAM TRIGGERED:<class>@<score>. The vision model runs on the sensor; the LoRa mesh carries a one-line verdict, which is all its bandwidth can afford. No language model anywhere, and none needed.

# meshtastic-aicamera: flash the vision model to the Grove Vision AI V2, then the C3 bridge
make flash-grove   # MobileNet/YOLO model onto the WiseEye2 NPU
make flash-c3      # I2C bridge firmware that relays detections to the LoRa node

Scale the same pattern up to a single-board computer and you get our Frigate local NVR. A Raspberry Pi 5 with a Hailo accelerator runs heavier vision pipelines than any sensor node can: object detection plus license-plate detection and on-device OCR that actually reads the plate text, all locally. Only the events, a detection or a decoded plate string, flow downstream, never the raw video. Different hardware tier, identical idea: the heavy model sits where the memory is, and the edge sends the answer. (OCR is still a fixed-shape vision model, which is exactly what an NPU like the Hailo accelerates well, not autoregressive text generation.)

So, does edge AI fit on a sensor node?

Yes, if you mean the right kind of AI. Classification, detection, anomaly spotting, and signal triage fit comfortably on a microcontroller in kilobytes. Language models do not, and no amount of quantization closes a 3-orders-of-magnitude gap. Here is the cheat sheet.

You want to...	Right hardware	What runs there
Spot a keyword, sound, or anomaly	ESP32-class MCU	TinyML CNN (kB)
Receive and parse drone Remote ID	ESP32-class MCU	Protocol parsing, no ML
Classify RF signals (narrow)	ESP32-class MCU	TinyML spectrogram CNN
Detect objects, faces, or gestures on a cheap node	Grove Vision AI V2 ($25, WiseEye2 + Ethos-U55)	~2 MB vision models on-device
Real-time object detection (high FPS)	Pi 5 + Hailo-8L	YOLO on NPU
Run a 270M-3B language model	Pi 5 / phone / Jetson	llama.cpp on CPU/GPU
Run a reasoning or 7B+ model	Jetson / server	GPU or Hailo-10H

Build for the row you are actually on. Put the reflex on the sensor, keep the brain where the memory is, and send the verdict over the wire. That is edge AI that ships.

Sources and further reading

Date checked: 2026-06-09

Subscribe to the Newsletter

Search

GDPR Compliance

Log in

Create an account

Reset password

Terms of use

Information Collected by SingularityByte.com

How We Use This Information

Information Disclosure

Cookies, Trackers, and Online Ads

Other Sites

Information Security

Do-Not-Track

Additional Options

Microsoft Clarity

Contact Us

Midjourney SREF Styles:

Microcontroller AI

Two devices, one word: "edge"

The tiny models, and where they really fit

What an ESP32 actually runs

Raspberry Pi 5, with and without Hailo

The actual jobs: Remote ID and signal triage

The pattern that works: classify at the edge, relay the verdict

So, does edge AI fit on a sensor node?

Sources and further reading

Hugging Face Smol Stack: SmolLM3 + SmolVLM2 Local AI in 2026

Quantization Formats Explained: NVFP4, MXFP4, and FP8

Related to this topic:

Latest topics

The Sections

About

Keep up to date with the latest updates & news