Rio 3.5 Open 397B: the open MoE a city government shipped as its own

License MIT

TL;DR

403B-parameter Mixture-of-Experts (about 17B active per token), MIT-licensed, shipped by IplanRIO, Rio de Janeiro's municipal IT company.
Independent forensics show it is roughly a 60/40 weight merge of Nex-N2-Pro and Qwen3.5-397B-A17B, disclosed on the model card only after community scrutiny.
Headline coding and reasoning scores are self-reported and unreproduced. GGUF quants run on Ollama and llama.cpp, but you need 100GB+ of memory.

System Requirements

RAM	128GB+ (4-bit)
GPU	Multi-GPU / data-center
VRAM	200GB+ (BF16)

✓ Ollama

Table of Contents

On June 13, 2026, a city government shipped a 403-billion-parameter language model. Not a startup, not a national lab: the municipal IT department of Rio de Janeiro. Rio 3.5 Open 397B landed on Hugging Face under an MIT license, pulled more than 190,000 downloads in its first two weeks, and got introduced as proof that a Brazilian city could build frontier-class AI on a public budget. Then people downloaded the weights and did the math. What they found is the real story here, and it is a useful one for anyone who ships open models. So here is what Rio 3.5 actually is, who built the weights inside it, and whether you should run it.

What IplanRIO actually shipped

The publisher is IplanRIO, the Empresa Municipal de Informatica e Planejamento, Rio de Janeiro's in-house technology company, working under the city's Rio.IA program. The model is a Mixture-of-Experts (MoE) transformer with about 403 billion total parameters and roughly 17 billion active per token.

One definition, because it carries the cost story. A Mixture-of-Experts model splits its feed-forward layers into many small "expert" subnetworks and routes each token to only a few of them. Rio 3.5 holds 403B parameters on disk but fires only about 17B per token, so you get the knowledge capacity of a large model at the inference cost of a much smaller one. That is the same bargain behind other big open MoEs like DeepSeek-V4. It takes both image and text input, carries a long context window, and ships in BF16 at roughly 807GB.

The license is the genuinely good news. MIT means you can use it commercially, modify it, and redistribute it with attribution, no permission required. For a model published with public money, that is the right call.

The merge nobody mentioned at launch

Here is where it gets interesting. The original model card presented Rio 3.5 as IplanRIO's own model, fine-tuned from Alibaba's Qwen3.5-397B-A17B. Within a day, the lab Nex-AGI opened a GitHub issue with receipts.

Two independent analyses, one conclusion. First, a weight comparison: every weight tensor in Rio 3.5 is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex-N2-Pro and Qwen3.5, across all 60 layers and every component of the network. Second, an identity test: strip Rio's hard-coded "You are Rio" system prompt, and the model introduces itself as "Nex, from Nex-AGI" 79 percent of the time, and as "Rio" zero percent of the time. Nex-AGI's summary was blunt: roughly 60 percent Nex-N2-Pro, 40 percent Qwen, with "no evidence of any training of their own."

IplanRIO updated the model card. It now states the model was "built via a merge of Nex-N2-Pro and Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model," and adds an apology: "We are sorry for the confusion and apologize profusely." The team blamed an "incorrect upload" of a base merged version instead of the final distilled one. The frontier-comparison benchmark table that headlined the launch quietly vanished from the description. The point worth keeping: the disclosure arrived after the community forced it, not before.

On-policy distillation, and what is actually defensible

Strip out the spin and there is a real technique in here. Merging two models by averaging their weights is a known, legitimate method, and so is on-policy distillation, where a student model is trained to match a stronger teacher's outputs on the student's own generations. Done openly, this is honest engineering, and the parent licenses allow it: Qwen3.5 and Nex-N2-Pro are both openly licensed, and MIT lets you build on top of them.

So the problem was never the method. It was attribution. Presenting a weight merge as a from-scratch municipal model, with a benchmark table implying original work, is the part that broke trust. Had IplanRIO led with "we merged two strong open models and distilled the result, here is what it scores," this would have been a nice story about a city government doing competent applied AI. The weights are identical either way. Only the framing changed.

Benchmarks, with a large asterisk

Every number below is self-reported on Rio's model card and has not been independently reproduced. They were also run with SwiReasoning active, a training-free inference trick that alternates between explicit chain-of-thought and latent-space reasoning, so they reflect an inference method as much as the weights. Read them as a vendor claim, not a measurement.

Benchmark	Rio 3.5 Open 397B	Qwen3.5-397B-A17B (base)
Terminal-Bench 2.1	70.8	52.5
SWE-Bench Verified	80.2	76.2
SWE-Bench Multilingual	77.0	69.3
GPQA Diamond	90.9	88.4
Humanity's Last Exam	36.5	28.7
IMOAnswerBench	89.5	80.9

All figures self-reported by IplanRIO, run with SwiReasoning active, and not independently reproduced. The launch also claimed wins over Qwen 3.7 Plus (Terminal-Bench 70.8 vs 70.3) and DeepSeek V4 Pro (67.9); that frontier comparison was later removed from the model card.

The base column is the honest comparison, because Qwen3.5-397B-A17B is half of Rio's own weights. On that view, the merge plus distillation does add real lift, especially on the agentic coding tests. A Hugging Face ablation on IMOAnswerBench is the tell: the Qwen base scores 80.9, extra training pushes it to 84.5, and switching on latent reasoning carries it to 89.5. Much of the headline gain is the inference method, not new knowledge baked into the weights.

Limitations and gotchas

Provenance. You are running a merge of someone else's weights. For research that is fine. For a product where you need to stand behind the model's lineage, the history is a liability.
Unreproduced scores. Nobody outside IplanRIO has confirmed the benchmarks, and they depend on SwiReasoning being switched on.
The upload confusion. IplanRIO admits an earlier version was the wrong (base merged) model, so confirm you have pulled the corrected weights.
It is huge. BF16 is about 807GB. Even aggressive quantization needs 100GB or more of memory, so this is not a laptop model.
Tooling lag. vLLM did not support the Qwen3.5 architecture in stable releases at launch, so you may need a nightly build.

Who should use it, and who should not

Use it if you want a strong, MIT-licensed open MoE for coding or reasoning, you have data-center class hardware or the patience for quantization, and the provenance drama does not bother you. The weights perform, whatever the backstory.

Skip it if you need clean lineage you can defend to a customer, you cannot spare 100GB of memory, or you want benchmarks that someone other than the publisher has checked. For agent-heavy work on hardware you actually own, the Qwen3.5 base it is built on is the saner starting point.

Run it in about 10 minutes

There is no small version, so "try it" means "have a lot of memory." The lowest-friction path is a community GGUF through Ollama or llama.cpp.

# Quantized GGUF builds are the only realistic local path.
# You still need 100GB+ of RAM or VRAM even at 4-bit.
# Browse the quants at huggingface.co/foxipanda/Rio-3.5-Open-397B-GGUF
ollama run hf.co/foxipanda/Rio-3.5-Open-397B-GGUF:Q4_K_XL

# Start at Q4 or higher. Sub-Q3 quantization visibly
# hurts the coding and math output that is the whole point.

If you have a multi-GPU box and want the reference weights, pull them straight from the Hugging Face card with transformers.

# Needs a recent transformers with Qwen3.5 (qwen3_5_moe) support
# and enough VRAM to shard ~807GB across your GPUs.
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "prefeitura-rio/Rio-3.5-Open-397B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "Write a Python function that returns the nth prime."
ids = tok(prompt, return_tensors="pt").to(model.device)
print(tok.decode(model.generate(**ids, max_new_tokens=512)[0]))

If you do not have the hardware, spend the ten minutes reading Nex-AGI's issue and the updated model card side by side. The forensics, a weight-blend analysis plus the identity-prompt test, are a clean template for checking provenance the next time a "new" frontier model arrives with numbers that look too good and a backstory that is too short.

Sources and further reading

Tested on: not independently tested. Rio 3.5 Open 397B is a 403B MoE at roughly 807GB in BF16 and needs 100GB or more of memory even when quantized, which is beyond our bench. Every benchmark here is IplanRIO-reported and run with SwiReasoning active; the provenance findings are Nex-AGI's analysis, corroborated by IplanRIO's own updated model card. Sources linked above.
Date checked: 2026-06-26

Subscribe to the Newsletter

Search

GDPR Compliance

Log in

Create an account

Reset password

Terms of use

Information Collected by SingularityByte.com

How We Use This Information

Information Disclosure

Cookies, Trackers, and Online Ads

Other Sites

Information Security

Do-Not-Track

Additional Options

Microsoft Clarity

Contact Us

Midjourney SREF Styles:

What IplanRIO actually shipped

The merge nobody mentioned at launch

On-policy distillation, and what is actually defensible

Benchmarks, with a large asterisk

Limitations and gotchas

Who should use it, and who should not

Run it in about 10 minutes

Sources and further reading

ZAYA1-8B

GLM-5.2

Related to this topic:

Latest topics

The Sections

About

Keep up to date with the latest updates & news