Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Z.ai - Conversational AI

Z.ai GLM-5.1

Z.ai (formerly Zhipu AI) has released GLM-5.1, an open-weight 744 billion parameter Mixture-of-Experts model with 40 billion active parameters. The model immediately took the top spot on SWE-Bench Pro with 58.4, beating GPT-5.4 (57.7) and Claude Opus 4.6 (57.3). Built on DeepSeek Sparse Attention with a 200K context window and 131K maximum output, GLM-5.1 is engineered for sustained autonomous coding agents capable of running plan, execute, test, fix, optimize loops for up to eight hours. The weights are available on Hugging Face under a permissive MIT license for full commercial use.
2026-04-10
Updated 0

License MIT
TL;DR
  • 744B MoE model with 40B active parameters from Z.ai
  • Tops SWE-Bench Pro at 58.4, beating GPT-5.4 and Claude Opus 4.6
  • MIT license, quantized versions available on Ollama
System Requirements
RAM128GB+
GPU8x A100/H100
VRAM320GB+
✓ Ollama

On April 7, 2026, Z.ai (formerly Zhipu AI) released GLM-5.1, an update to their flagship GLM-5 model that immediately took the top spot on the SWE-Bench Pro coding leaderboard. With 744 billion parameters in a Mixture-of-Experts architecture and a permissive MIT license, GLM-5.1 is now the strongest open-weight coding model on the planet.

The Architecture: Sparse, Efficient, Long Context

GLM-5.1 is built on a 744 billion parameter Mixture-of-Experts architecture with only 40 billion active parameters per token. This sparse activation pattern gives it the capacity of a trillion-parameter class model while keeping inference costs roughly proportional to a 40B dense model. The context window is 200K tokens, with a maximum output length of 131K tokens, enabled by DeepSeek Sparse Attention (DSA) to keep long-context inference fast and memory-efficient.

Number One on SWE-Bench Pro

The headline result comes from SWE-Bench Pro, a tough benchmark that measures how well a model can fix real-world bugs in open-source repositories:

  • GLM-5.1: 58.4 (rank 1)
  • GPT-5.4: 57.7
  • Claude Opus 4.6: 57.3

An open-weight model with a permissive MIT license just beat the two most expensive proprietary coding assistants on the market. On the broader coding composite that also includes Terminal-Bench 2.0 and NL2Repo, Claude Opus 4.6 still leads at 57.5 versus GLM-5.1 at 54.9, so the top spot is benchmark-specific, but the result is still historic.

Autonomous Coding Agent Loop

GLM-5.1 is engineered for sustained agentic work. The model manages a full plan, execute, test, fix, optimize loop autonomously, and Z.ai reports test runs where it kept iterating productively for up to eight hours without human intervention. That kind of long-horizon task completion has historically been the domain of closed frontier models only.

MIT License: Fully Open Commercial Use

The weights are on Hugging Face under the MIT license, which means unrestricted commercial use, modification, and redistribution. You can run GLM-5.1 in production, build a SaaS product on top of it, or fine-tune it for your own domain without licensing friction.

Get Started

  1. Download the weights from Hugging Face (zai-org/GLM-5).
  2. Deploy with vLLM or SGLang for production-grade inference.
  3. Try the hosted API at z.ai if you do not want to self-host.
  4. Read the technical report for architecture and training details.

Why This Matters

GLM-5.1 is the first MIT-licensed model to top a major coding benchmark. For anyone building code assistants, autonomous developer agents, or refactoring tools, this is the model to benchmark against next. It runs anywhere Apache 2.0 and MIT models run, it delivers Claude Opus-class performance on real coding tasks, and it comes with the long-horizon autonomy that makes multi-hour agent workflows actually feasible on self-hosted infrastructure.

 

Prev Article
Google Gemma 4
Next Article
OpenThinker-32B

Related to this topic: