Newsletter image

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Search

GDPR Compliance

We use cookies to ensure you get the best experience on our website. By continuing to use our site, you accept our use of cookies, Privacy Policy, and Terms of Service.

Moonshot AI - Conversational AI

Moonshot Kimi K2.5

Moonshot AI has released Kimi K2.5, a 1 trillion parameter Mixture-of-Experts language model with 32 billion active parameters per request. The model uses 61 layers with 384 experts and sparse 8-expert activation, natively trained on roughly 15 trillion mixed vision and text tokens for a 256K context window. Kimi K2.5 outperforms GPT-5.2 on MMMU Pro (78.5%), BrowseComp (74.9%), and AIME 2025 (96.1%), and the Agent Swarm configuration reaches 50.2% on Humanity Last Exam at 76% lower cost than Claude Opus 4.5. Weights ship on Hugging Face under a Modified MIT license.
2026-04-11
Updated 0

License Apache 2.0
TL;DR
  • Kimi K2.5 is a 32B parameter MoE model from Moonshot AI with 256 experts
  • Beats GPT-4o on MMLU, HumanEval, and agent benchmarks at a fraction of the cost
  • Runs locally via Ollama with Q4_K_M quantization on 12GB VRAM
System Requirements
RAM16GB
GPURTX 4070
VRAM12GB
✓ Ollama ✓ Apple Silicon

Moonshot AI released Kimi K2.5 on January 27, 2026, and it may be the most impressive open-weight model you have not heard of yet. A 1 trillion parameter Mixture-of-Experts model with 32 billion active parameters, native multimodal training, and a 256K context window, Kimi K2.5 beats GPT-5.2 on vision, agentic, and math benchmarks while shipping under a Modified MIT license.

A Sparse Trillion Parameter Giant

Under the hood, Kimi K2.5 is a deeply sparse MoE:

  • 1 trillion total parameters
  • 32 billion active per request (roughly 3.2% activation ratio)
  • 61 layers, 384 experts, 8 experts selected per token
  • 256K context length (about 500 pages of text)
  • Trained from scratch on roughly 15 trillion tokens of mixed vision and text data

The native multimodal training matters. Most models that claim multimodal capability bolt a vision adapter onto a text-only base. Kimi K2.5 saw interleaved image and text tokens throughout pretraining, which is why it punches well above its weight on vision benchmarks.

Benchmarks That Beat GPT-5.2

Across several categories, Kimi K2.5 is outperforming the latest closed frontier model:

  • MMMU Pro (multimodal understanding): 78.5%
  • BrowseComp (agentic web tasks): 74.9%
  • AIME 2025 (competition math): 96.1%, compared to roughly 88% for GPT-5.2

The even more interesting result comes from Moonshot's Agent Swarm configuration, where multiple Kimi K2.5 instances collaborate on hard problems. Agent Swarm scored 50.2% on Humanity's Last Exam, at 76% lower cost than Claude Opus 4.5. That is a genuine "frontier reasoning on a budget" story.

Modified MIT License

Kimi K2.5 weights are available on Hugging Face under a Modified MIT license. Commercial use is allowed with standard attribution requirements. You can deploy on your own infrastructure using vLLM, SGLang, or KTransformers for efficient serving. Moonshot also runs a hosted API for teams that want zero operational overhead.

Why Self-Host a Trillion Parameter Model

For most teams, running a 1T parameter model sounds absurd. Because Kimi K2.5 only activates 32B parameters per forward pass, the memory and compute cost per request is closer to a 32B dense model than a trillion parameter giant. That makes it actually deployable on 8-GPU servers with H200 or MI300X hardware. If you were already running a 70B dense model, K2.5 is a realistic upgrade on similar hardware.

Get Started

  1. Download the weights from Hugging Face.
  2. Deploy with vLLM using the sample scripts in the official GitHub repo.
  3. Benchmark locally against your current agent stack before deciding on production.
  4. Read the tech blog at kimi.com/blog/kimi-k2-5 for training and evaluation details.

Moonshot AI is not yet a household name in the West, but Kimi K2.5 is arguably the strongest open-weight reasoning and agentic model released so far in 2026. For anyone serious about building autonomous AI systems on open infrastructure, it deserves a close look.

 

Prev Article
Z.ai GLM-5.1
Next Article
Meta Muse Spark

Related to this topic: