Kimi K2.7 Code: Moonshot's coding model that thinks 30% less to cut agent costs

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Log in

Have no account yet? Sign up

Create an account

Already have an account? Log in

Reset password

Remember your password? Log in

Terms of use

SingularityByte.com values the privacy of our users. Therefore, this privacy policy explains in detail how we use and protect the information we collect when you visit our website.. Read this privacy policy completely. Please refrain from visiting the site if the terms outlined below are not satisfactory to you. We reserve the right to change this policy at any time and will list these changes in the updates section of the policy. By reading this notice and visiting the site, you agree that you understand that customers will not be personally notified when this policy changes. Therefore, we advise our customers to frequently review our privacy policy so that they remain aware of its updates. By using the site, you accept that the posted policy and all its changes apply to your interaction with SingularityByte.com.

Information Collected by SingularityByte.com

Personal information may be collected by this site in many ways. This information includes:

Personal identifying information like your name, address, email, phone number, age, gender, and other personal data
Server data related to the IP address you used to visit our website, which includes your address, browser, OS, access time, and site activity.
Financial information related to your orders including your payment method and identifying payment information. We rarely store financial information collected on our site for transaction purposes. That information gets sent directly to our payment processor.
Social network data including Facebook permissions and user information from other networks, provided you log onto our site using one of these media sites.
Mobile device information such as your device ID, model, and location, if you use our site by accessing trough our website.

How We Use This Information

Our website uses information collected to:
• Manage your account information
• Customize ads
• Deliver promotions
• Email your account confirmation
• Manage purchases and payments
• Increase site efficiency
• Notify you of updates
• Offer new products
• Monitor and prevent theft
• Request your customer feedback
• Resolve account disputes
• Respond to your service requests

Information Disclosure

Normally, your information stays on our site. However, below we have listed the situations that may
require us to share the information we collect from you:
• When required by law, such as for fraud protection
• With our third-party providers for payment processing and hosting
• With your consent for marketing purposes
• When you post comments on the site
• To our advertisers, affiliates, and partners
• If this site goes bankrupt and data must be transferred

Cookies, Trackers, and Online Ads

We may use cookies, trackers, web beacons, and other technology to customize our website to improve your experience. We may customize the site using this information. These trackers do not have access to your personal information and can be removed from your browser options. In addition, third-party software provides ads for our site for marketing campaigns. These programs have access to tracking technology to optimize your ad experience. For more information about these
ads, visit [link to the privacy policies of affiliate advertisers]. Website analytics such as through Google Analytics may also be used to track users
and remarket our website. We do not give these vendors access to your personal information.

Other Sites

Our website may contain links to third-party websites in the form of policies, ads, and other non-affiliated links. Once you leave our site, we are no longer responsible for how your information is collected and disclosed. Please refer to the privacy policies of those third-party sites for more information.

Information Security

We take technical and administrative precautions to protect your data, but we cannot guarantee its safety against all types of fraud or misuse. If you provide personal information, we cannot verify its total security against all types of interception.

Do-Not-Track

Some browsers offer Do-Not-Track settings to prevent any information from being distributed. Since these settings have not been legally established as standard practice, we do acknowledge these settings.

Additional Options

At any time, you may opt to review or change your account settings, including contact information. If you wish to delete your account, you may do so to remove most of your information, however, some identifying information will be retained to prevent fraud.
You may also opt-out of emails and other correspondences from our site at any time.

Microsoft Clarity

We partner with Microsoft Clarity and Microsoft Advertising to capture how you use and interact with our website through behavioral metrics, heatmaps, and session replay to improve and market our products/services. Website usage data is captured using first and third-party cookies and other tracking technologies to determine the popularity of products/services and online activity. Additionally, we use this information for site optimization, fraud/security purposes, and advertising. For more information about how Microsoft collects and uses your data, visit the Microsoft Privacy Statement.

Contact Us

If you have questions or concerns about this privacy policy, please feel free to contact us at: desk@SingularityByte.com

Do you agree to our terms? Sign up

License Modified MIT

TL;DR

Moonshot AI's coding-specialized Kimi K2.7 Code: a 1T-parameter MoE (about 32B active) with a 256K context, Modified MIT licensed.
Uses roughly 30 percent fewer thinking tokens than K2.6 at similar quality, cutting the cost of long agent runs.
Benchmarks are first-party only and disputed by independent testers; needs a multi-GPU node to self-host.

☍ Announcement ⬇ Download Model

System Requirements

RAM	data-center
GPU	8x H200
VRAM	~500GB+ (INT4)

Table of Contents

Moonshot AI shipped Kimi K2.7 Code on June 12, 2026, and the pitch is refreshingly un-flashy: roughly the same coding quality as before, but it thinks about 30 percent less. In an era where every agent run bills you for thousands of "thinking" tokens, a model that reaches the same answer with fewer of them is a cost story, not a benchmark story. That is the honest way to read this release, because the benchmarks here are all Moonshot's own, and at least one outlet's testers say they do not hold up. Here is what it is, what to believe, and how to run it.

What it is

Kimi K2.7 Code is a coding-specialized variant built on the K2.6 base from Kimi's maker, the Beijing lab Moonshot AI. It is a big sparse Mixture-of-Experts model: 1 trillion total parameters with about 32 billion active per token, 384 experts (eight routed plus one shared), and a 256K-token context window. (Mixture-of-Experts means only a slice of the network runs per token, so you pay storage for the whole thing and compute for a fraction.) It takes text, image, and video input, reasoning is always on, and it ships under a Modified MIT license that allows commercial use, with a clause to credit Kimi if you cross 100 million monthly users or 20 million dollars a month in revenue.

The actual story: fewer thinking tokens

The headline number Moonshot stands behind is the 30 percent cut in thinking tokens versus K2.6, and it is confirmed on the official model card. Why care? A long agentic coding run is a loop: the model reads, plans, calls a tool, reads the result, plans again, dozens or hundreds of times. Each think step burns output tokens you pay for. Trim 30 percent off the reasoning at the same answer quality and you have cut the cost and the latency of every step, which compounds across a multi-hour job. For anyone running agents at scale, that matters more than one more point on a leaderboard.

Benchmarks, with a real warning

Here is where you need to be careful. Moonshot published only its own bespoke benchmarks. The standard public ones builders trust, SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench, LiveCodeBench, are not on the model card. The numbers below are first-party.

Benchmark (Moonshot's own)	K2.7 Code	K2.6
Kimi Code Bench v2	62.0	50.9
MLS Bench Lite	35.1	26.7
MCP Atlas (agentic)	76.0	69.4
MCP Mark Verified (agentic)	81.1	72.8

All figures self-reported by Moonshot on bespoke benchmarks; standard SWE-Bench and Terminal-Bench scores were not published. A SWE-Bench Verified figure of 78.2 circulates on aggregator sites but appears in no primary source. Treat everything here as vendor-reported and unverified.

And the skepticism is on the record: VentureBeat reported that practitioners who tried it say the benchmarks "do not check out," noting the model was not submitted to any independent leaderboard. So the credible claim is the token reduction (cost), not a coding crown.

Limitations and gotchas

It is not local. A 1T-parameter MoE keeps all experts resident; even native INT4 weights are roughly 500GB or more, so realistically you need a multi-GPU node (think 8x H200). The "runs on 2x A100" claims floating around are wrong.
Benchmarks are first-party only and disputed. Verify on your own tasks before trusting any number.
Reasoning is always on; there is no fast non-thinking mode, though making that thinking cheaper is the whole point of this release.
No first-party Ollama tag yet; community GGUF quants exist (Unsloth), but at this size they need serious memory.

Who should use it

Use it if you run long, tool-heavy coding agents and your bill is dominated by reasoning tokens; the cheaper thinking is a real, compounding saving, and the Modified MIT license lets you self-host or fine-tune. Most teams will hit it through Moonshot's API (about $0.95 per million input tokens, $4.00 output, with cheap cache hits) rather than standing up a 1T model. If you want a model with independently verified coding scores, wait for third-party numbers or reach for GLM-5.2, which at least sits on public leaderboards.

Run it in about 10 minutes

Realistically, the API is the 10-minute path; self-hosting a 1T model is not.

# Fastest: the OpenAI-compatible Moonshot API. Set MOONSHOT_API_KEY first.
curl https://api.moonshot.ai/v1/chat/completions \
  -H "Authorization: Bearer $MOONSHOT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"kimi-k2.7-code","messages":[{"role":"user","content":"Refactor this function and add tests."}]}'

# Self-host the native INT4 weights on a multi-GPU node with vLLM
vllm serve moonshotai/Kimi-K2.7-Code --tensor-parallel-size 8

Point it at a coding harness like OpenCode and give it a real multi-step task, then watch the token counter. The whole reason to pick this model over its siblings is what that counter does, so that is the thing to measure.

Sources and further reading

Tested on: not independently tested. Kimi K2.7 Code is a 1T-parameter MoE that needs a multi-GPU node even at INT4, beyond our bench. Every benchmark here is Moonshot-reported on bespoke tests; independent reviewers (VentureBeat) are skeptical, and standard SWE-Bench and Terminal-Bench numbers were not published.
Date checked: 2026-06-26

Subscribe to the Newsletter

Search

GDPR Compliance