MiniMax M3: 1M-Token Sparse-Attention Coder, Open Weights Pending

License Other

TL;DR

Mixture-of-Experts with MiniMax Sparse Attention (MSA) and a 1M-token context window
Native text, image, and video input, with agentic coding and computer-use focus
API and Ollama cloud live now; downloadable weights and license still pending as of June 2026

Table of Contents

On June 1, 2026, MiniMax shipped M3: a sparse-attention model with a 1-million-token context window, native image and video input, and an agentic-coding focus. The pitch is frontier-class coding at open-weight prices. The catch is that, as of this writing, you cannot download the weights yet, the license has not been published, and every headline benchmark was run by MiniMax on MiniMax hardware. So here is what actually changed, what holds up, and what you can run today.

What MiniMax M3 actually is

M3 is the successor to the M2 line (M2 landed in December 2025, with M2.5 and M2.7 following in early 2026). Three things are genuinely new versus M2.7:

Context jumps from 200K to 1M tokens (MiniMax guarantees a 512K minimum). That is the headline number, and it is the one independent reviewers can already poke at through the API.
Native multimodality. M2.7 was text-only. M3 takes text, image, and video as input. Output stays text; MiniMax keeps image and video generation in its separate Hailuo and Image models.
Computer use and agentic coding. M3 adds desktop operation (clicking around a real GUI) and leans hard into long-horizon coding and browsing.

One thing MiniMax did not ship: parameter counts. The widely repeated "229B total, 9.8B active, 256 experts" figure is community inference carried over from M2.7, not an official spec. Treat it as a guess until the technical report lands. It is almost certainly a Mixture-of-Experts model (a design where only a fraction of the network activates per token), but the exact size is unconfirmed.

MSA: how a 1M context got cheap

The real engineering story is MiniMax Sparse Attention (MSA). Standard attention compares every token to every other token, so cost grows quadratically and a 1M context becomes brutal. MSA splits the work into two stages. A lightweight index branch uses reduced-dimension key vectors to pick which blocks of the context matter, then a sparse branch runs full attention only on the blocks that survived the cut.

Crucially, MiniMax built MSA on a Grouped Query Attention backbone and kept real, uncompressed key/value vectors. That is a deliberate split from DeepSeek's compressed-KV approach, and it keeps the model compatible with standard FlashAttention kernels. MiniMax claims the payoff at 1M tokens is 9.7x faster prefill, 15.6x faster decoding, and roughly one-twentieth the per-token compute versus M2. Those are vendor numbers, but the architecture is plausible and the speedups are the kind of thing the community can verify once weights drop.

Benchmarks: strong on paper, unproven in the wild

MiniMax's launch deck is a wall of green bars. Here are the headline self-reported scores. Every row is vendor-run on MiniMax's own infrastructure and agent scaffolding.

Benchmark	M3 score	What it measures
SWE-bench Verified	80.5%	Real GitHub bug fixes
SWE-bench Pro	59.0%	Harder, longer SWE tasks
Terminal-Bench 2.1	66.0%	Shell and terminal agents
BrowseComp	83.5%	Autonomous web browsing
OSWorld-Verified	70.06%	Desktop computer use
MCP Atlas	74.2%	Tool calling

On paper that puts M3 in a dead heat with the best open coding models. Community-aggregated SWE-bench Verified standings (via BenchLM) bunch them within two points:

Model	SWE-bench Verified
MiniMax M3	80.5%
DeepSeek V4 Pro	80.6%
Qwen3.7 Max	80.4%
Kimi K2.6	80.2%
GLM-5	77.8%

Now the reality check. Two independent signals exist so far, and they pull in different directions.

Artificial Analysis places M3 at 55 on its composite Intelligence Index, around 9th out of 164 models. Respectable, mid-pack frontier, not the leader the marketing implies.
DeepSWE, an independent long-horizon coding eval with a 90-minute wall-clock budget, scored M3 at 13.3% pass@1. For context, that run put GPT-5.5 at 70% and Claude Opus 4.7 at 54%. M3 also burned a median 80K output tokens and 325 agent steps per task, far more than the models that beat it.

SWE-bench Verified and DeepSWE measure different things, so the gap is not as damning as 80.5 versus 13.3 looks. But it is a loud reminder that a vendor benchmark and a hard independent agent run are not the same animal. Also worth noting: MiniMax benchmarked against Claude Opus 4.7, not the 4.8 that was already out at launch. On SWE-bench Pro, Opus 4.8 scores about 69% to M3's 59%.

How to use M3 today

The weights are not public yet (MiniMax promised them within about 10 days of the June 1 launch). Until they land, M3 is API-and-cloud only. Here is what works right now.

Via the API

M3 is live on the MiniMax platform with an OpenAI-compatible endpoint. Grab a key from the MiniMax console, then point any OpenAI client at the MiniMax base URL (check the docs for your region's host):

curl https://api.minimax.io/v1/chat/completions \
  -H "Authorization: Bearer $MINIMAX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "MiniMax-M3",
    "messages": [{"role": "user", "content": "Refactor this function and explain why."}]
  }'

Or from Python, reusing the OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MINIMAX_KEY",
    base_url="https://api.minimax.io/v1",  # confirm the host in the MiniMax docs
)

resp = client.chat.completions.create(
    model="MiniMax-M3",
    messages=[{"role": "user", "content": "Summarize this 400k-token repo."}],
)
print(resp.choices[0].message.content)

Pricing is the actual selling point. Standard rates are $0.60 per million input tokens and $2.40 per million output, doubling to $1.20 / $4.80 once you cross 512K context. A launch promo cut that roughly in half for the first week. Against Claude Opus or GPT-5.5 list prices, that is somewhere between one-tenth and one-twentieth the cost.

Via Ollama (cloud, not local)

There is an Ollama tag, and it is tempting to read it as "runs on my laptop." It does not. The tag routes to MiniMax's hosted, zero-retention cloud inference:

ollama run minimax-m3:cloud

No local GGUF exists yet because the weights have not shipped. When they do, expect the usual community quantizations, but plan for serious hardware: a ~230B-class MoE is a multi-GPU server job, not an RTX 4090 hobby build.

Via your editor

Because the endpoint is OpenAI-compatible, M3 drops into Claude Code, Cursor, and Cline as a custom model in a couple of minutes. Set the base URL and model name, paste your key, and you have a cheap long-context coding backend. That is the under-10-minute move: wire it into the agent you already use and let it chew on a large codebase that would blow a 200K window.

The parts the launch deck skips

Three caveats matter before you bet a workflow on M3.

Open-weight is not open-source. M3's license has not been published, but the precedent is not encouraging. M2.7 shipped under what MiniMax called a "modified MIT" license that actually forbids commercial use without prior written authorization. The community called that what it is: not really open. Assume M3 lands somewhere similar until the LICENSE file proves otherwise.

Every benchmark is in-house. No independent SWE-bench Verified replication existed at launch, and the one tough independent run (DeepSWE) was unflattering. Wait for third-party numbers before treating the 80% claims as load-bearing.

Provenance questions. Anthropic has alleged that MiniMax conducted model distillation through thousands of fraudulent accounts, a claim MiniMax has not publicly addressed. If model provenance or jurisdiction matters for your use case, factor that in.

Who should use it, and who should wait

Try it now if you want a cheap, long-context API for non-commercial coding, research, or prototyping, and you can verify outputs yourself. A 1M window at $0.60 per million tokens is a genuinely useful tool for chewing through big repos and document piles.

Wait if you need self-hosted weights, a clear commercial license, or independently verified performance. None of those exist yet. Check back once the weights and the technical report land, which should be days, not months.

Sources and further reading

Tested on: not independently tested. Weights were unreleased as of 2026-06-09, so all figures here are vendor- or community-reported via the sources above, with the independent DeepSWE and Artificial Analysis results flagged as such.
Date checked: 2026-06-09

Subscribe to the Newsletter

Search

GDPR Compliance

Log in

Create an account

Reset password

Terms of use

Information Collected by SingularityByte.com

How We Use This Information

Information Disclosure

Cookies, Trackers, and Online Ads

Other Sites

Information Security

Do-Not-Track

Additional Options

Microsoft Clarity

Contact Us

Midjourney SREF Styles:

What MiniMax M3 actually is

MSA: how a 1M context got cheap

Benchmarks: strong on paper, unproven in the wild

How to use M3 today

Via the API

Via Ollama (cloud, not local)

Via your editor

The parts the launch deck skips

Who should use it, and who should wait

Sources and further reading

HY-MT1.5

ZAYA1-8B

Related to this topic:

Latest topics

The Sections

About

Keep up to date with the latest updates & news