Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

AI2 - Conversational AI

AI2 OLMo 3.1

The Allen Institute for AI (AI2) has released OLMo 3.1, a family of fully open 32 billion parameter reasoning models. Unlike most open-weight releases, OLMo 3.1 ships with the complete training data, training code, evaluation scripts, and intermediate checkpoints, making it the most transparent frontier model ever published. The Think 32B variant gains 5+ points on AIME, 4+ on ZebraLogic, 4+ on IFEval, and over 20 points on IFBench compared to OLMo 3. AI2 calls the Instruct 32B variant the most capable fully open chat model to date.

License Apache 2.0
TL;DR
  • Strongest fully open 32B reasoning model: ships weights, Dolma training data, training code, eval scripts, and intermediate checkpoints under Apache 2.0.
  • Two variants: Think 32B (reasoning, +20 pts on IFBench over OLMo 3) and Instruct 32B (chat). Both 32B dense.
  • Only frontier model from a US nonprofit with fully reproducible training. Runs locally via Ollama or llama.cpp.
System Requirements
RAM16GB
GPURTX 4090 24GB
VRAM20GB Q4 / 64GB+ BF16
✓ Ollama ✓ Apple Silicon

The Allen Institute for AI (AI2) released OLMo 3.1 as an update to the OLMo 3 family, and it holds a title no other frontier model can claim: it is the strongest fully open reasoning model ever released. Not just open weights, but open weights plus the full training data, training code, evaluation scripts, and intermediate checkpoints. For researchers, that level of transparency is a different category of release.

Two 32B Checkpoints: Think and Instruct

OLMo 3.1 ships in two 32 billion parameter variants, each targeting a different use case:

  • OLMo 3.1 Think 32B: a reasoning-tuned checkpoint from an extended reinforcement learning run. This is the one for hard math, logic, and multi-step problem solving.
  • OLMo 3.1 Instruct 32B: a chat-tuned checkpoint that applies the OLMo 3 Instruct 7B recipe at 32B scale. AI2 describes it as the most capable fully open chat model to date.

Both variants are part of the broader OLMo 3 model flow, which ranges from 7 billion to 32 billion parameters across dense variants suitable for everything from laptops to high-end compute clusters.

Substantial Gains Over OLMo 3

OLMo 3.1 Think 32B brings concrete improvements over the original OLMo 3 release:

  • AIME (math): over 5 points of improvement
  • ZebraLogic (reasoning): over 4 points of improvement
  • IFEval (instruction following): over 4 points of improvement
  • IFBench: over 20 points of improvement
  • Stronger performance on coding and complex multi-step tasks

The IFBench jump is the most dramatic. A 20 point improvement on instruction following is the difference between a model that technically completes tasks and one that reliably does what you actually asked for.

What "Fully Open" Really Means

Most open-source AI releases stop at publishing weights. AI2 publishes the entire pipeline:

  • Model weights for every checkpoint
  • Full training data, including the curated Dolma corpus
  • Training code end to end
  • Evaluation scripts used for every benchmark claim
  • Intermediate checkpoints so you can inspect how the model evolved during training
  • Tools for probing the model internals

That level of reproducibility is rare in AI research. If you want to actually understand how a frontier model was built, or audit it for bias, or train a derivative from scratch on different data, OLMo is the only option at this capability level.

A US Lab Pushing Truly Open AI

While most of the frontier open-weight releases of 2026 have come from Chinese labs (Alibaba Qwen, Z.ai GLM, Moonshot Kimi, DeepSeek), OLMo is the strongest entry from a US nonprofit. The Allen Institute has been consistent about its mission: keep the frontier of AI research open, reproducible, and accessible to academics and smaller organizations that cannot compete with the compute budgets of closed labs.

Get Started

  1. Download weights from Hugging Face (allenai).
  2. Read the release post at allenai.org/blog/olmo3.
  3. Explore the training data at the Dolma corpus on Hugging Face.
  4. Run the evaluation scripts from the OLMo GitHub repository to reproduce benchmark numbers yourself.

OLMo 3.1 is the model to pick when transparency matters as much as capability. For researchers, educators, and anyone building AI tooling that needs to be auditable top to bottom, nothing else comes close.

 

Prev Article
Alibaba Qwen3.5
Next Article
Meta Muse Spark

Related to this topic: