DeepSeek-V4 Preview: 1.6T MoE, 1M context, MIT-licensed agentic coder

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Log in

Have no account yet? Sign up

Create an account

Already have an account? Log in

Reset password

Remember your password? Log in

Terms of use

SingularityByte.com values the privacy of our users. Therefore, this privacy policy explains in detail how we use and protect the information we collect when you visit our website.. Read this privacy policy completely. Please refrain from visiting the site if the terms outlined below are not satisfactory to you. We reserve the right to change this policy at any time and will list these changes in the updates section of the policy. By reading this notice and visiting the site, you agree that you understand that customers will not be personally notified when this policy changes. Therefore, we advise our customers to frequently review our privacy policy so that they remain aware of its updates. By using the site, you accept that the posted policy and all its changes apply to your interaction with SingularityByte.com.

Information Collected by SingularityByte.com

Personal information may be collected by this site in many ways. This information includes:

Personal identifying information like your name, address, email, phone number, age, gender, and other personal data
Server data related to the IP address you used to visit our website, which includes your address, browser, OS, access time, and site activity.
Financial information related to your orders including your payment method and identifying payment information. We rarely store financial information collected on our site for transaction purposes. That information gets sent directly to our payment processor.
Social network data including Facebook permissions and user information from other networks, provided you log onto our site using one of these media sites.
Mobile device information such as your device ID, model, and location, if you use our site by accessing trough our website.

How We Use This Information

Our website uses information collected to:
• Manage your account information
• Customize ads
• Deliver promotions
• Email your account confirmation
• Manage purchases and payments
• Increase site efficiency
• Notify you of updates
• Offer new products
• Monitor and prevent theft
• Request your customer feedback
• Resolve account disputes
• Respond to your service requests

Information Disclosure

Normally, your information stays on our site. However, below we have listed the situations that may
require us to share the information we collect from you:
• When required by law, such as for fraud protection
• With our third-party providers for payment processing and hosting
• With your consent for marketing purposes
• When you post comments on the site
• To our advertisers, affiliates, and partners
• If this site goes bankrupt and data must be transferred

Cookies, Trackers, and Online Ads

We may use cookies, trackers, web beacons, and other technology to customize our website to improve your experience. We may customize the site using this information. These trackers do not have access to your personal information and can be removed from your browser options. In addition, third-party software provides ads for our site for marketing campaigns. These programs have access to tracking technology to optimize your ad experience. For more information about these
ads, visit [link to the privacy policies of affiliate advertisers]. Website analytics such as through Google Analytics may also be used to track users
and remarket our website. We do not give these vendors access to your personal information.

Other Sites

Our website may contain links to third-party websites in the form of policies, ads, and other non-affiliated links. Once you leave our site, we are no longer responsible for how your information is collected and disclosed. Please refer to the privacy policies of those third-party sites for more information.

Information Security

We take technical and administrative precautions to protect your data, but we cannot guarantee its safety against all types of fraud or misuse. If you provide personal information, we cannot verify its total security against all types of interception.

Do-Not-Track

Some browsers offer Do-Not-Track settings to prevent any information from being distributed. Since these settings have not been legally established as standard practice, we do acknowledge these settings.

Additional Options

At any time, you may opt to review or change your account settings, including contact information. If you wish to delete your account, you may do so to remove most of your information, however, some identifying information will be retained to prevent fraud.
You may also opt-out of emails and other correspondences from our site at any time.

Microsoft Clarity

We partner with Microsoft Clarity and Microsoft Advertising to capture how you use and interact with our website through behavioral metrics, heatmaps, and session replay to improve and market our products/services. Website usage data is captured using first and third-party cookies and other tracking technologies to determine the popularity of products/services and online activity. Additionally, we use this information for site optimization, fraud/security purposes, and advertising. For more information about how Microsoft collects and uses your data, visit the Microsoft Privacy Statement.

Contact Us

If you have questions or concerns about this privacy policy, please feel free to contact us at: desk@SingularityByte.com

Do you agree to our terms? Sign up

License MIT

TL;DR

Two MoE variants: V4-Pro 1.6T (49B active) and V4-Flash 284B (13B active)
Native 1M context, 27% FLOPs and 10% KV cache vs V3.2
MIT Preview, leads open-weights GDPval-AA agentic at 1554

☍ Announcement ⬇ Download Model

System Requirements

RAM	96GB
GPU	H200 or 2x H100
VRAM	80GB+ (Flash 4-bit)

✓ Ollama

Table of Contents

On April 24, 2026, DeepSeek open-sourced DeepSeek-V4, a Preview release that ships two MIT-licensed Mixture-of-Experts variants and a native one-million-token context window. DeepSeek-V4-Pro packs 1.6 trillion total parameters (49 billion active) and posts an 80.6% on SWE-Bench Verified. DeepSeek-V4-Flash brings the same long-context architecture down to 284 billion parameters (13 billion active) at API pricing that Simon Willison called "the cheapest of the small models." It is the loudest open-weight drop of the spring, and the first time DeepSeek has explicitly framed a release around agentic coding.

Two Variants, One Architecture

DeepSeek released both base and instruct checkpoints for the Pro and the Flash, plus a Technical Report PDF on the Hugging Face collection. Everything is under the MIT license, so commercial use, fine-tuning, and redistribution are all unrestricted.

Model	Total params	Active params	Context	Focus
DeepSeek-V4-Pro	1.6T	49B	1M tokens	Frontier-class reasoning, coding, agents
DeepSeek-V4-Flash	284B	13B	1M tokens	Cost-efficient long-context inference

The Pro is 61 layers deep with a fresh MoE topology, both variants train on roughly 32 trillion tokens with the Muon optimizer, and post-training runs a two-stage recipe (domain experts first, then unified consolidation). Mixed precision goes aggressive: most weights in FP8, MoE expert weights in FP4, RoPE dimensions kept in BF16.

Hybrid Attention: CSA Plus HCA

The architectural headline is a brand-new hybrid attention scheme. DeepSeek-V4 alternates two layer types:

CSA (Compressed Sparse Attention) compresses the KV cache 4x using softmax-gated pooling, then reads through a top-k FP4 "lightning indexer" to keep retrieval sparse and cheap.
HCA (Heavily Compressed Attention) compresses the KV cache 128x and runs dense attention over the compressed blocks, with a sliding-window branch on the side to preserve recency.

Layers 0 and 1 are HCA, layers 2 through 60 alternate CSA and HCA, and the final Multi-Token-Prediction block is sliding-window only. The MoE feed-forward blocks use DeepSeekMoE with manifold-constrained hyper-connections replacing standard residual connections. The Pro card also exposes three reasoning modes: Non-think, Think High, and Think Max (Think Max recommends a 384K-or-larger context window).

1M Context With 90% Less KV Cache

Long-context inference has been the painful trade-off of the 2025 generation. DeepSeek-V4 attacks it head-on. At 1M tokens, V4-Pro reports 27% of the per-token inference FLOPs and 10% of the KV cache memory compared to DeepSeek-V3.2. V4-Flash pushes the KV cache reduction even further (the Hugging Face blog cites 7%, the Flash model card says 10%; treat the gap as still settling).

The practical consequence is real. On long-context retrieval (MRCR-1M), V4-Pro scores 83.5 MMR. Eight-needle MRCR stays above 0.82 through 256K tokens and only drops to 0.59 at the full 1M. CorpusQA-1M lands at 62.0 ACC. Translation: the context window is not a marketing number, the model actually uses it.

Benchmarks: Frontier-Adjacent on Code and Agents

DeepSeek-V4-Pro lands inside the closed-frontier band on most agentic and coding benchmarks. All numbers below come from the official Pro model card:

Benchmark	V4-Pro	Notes
SWE-Bench Verified	80.6%	Statistical tie with Opus 4.6 Max (80.8) and Gemini 3.1 Pro (80.6)
SWE-Bench Pro	55.4%	Real-world repo bug-fix benchmark
Terminal-Bench 2.0	67.9	Trails GPT-5.4-xHigh (75.1)
Toolathlon	51.8	Beats K2.6 (50.0) and GLM-5.1 (40.7)
LiveCodeBench	93.5%	Codeforces rating: 3206
HMMT 2026 Feb	95.2%	Competition math
GPQA Diamond	90.1%	Graduate-level reasoning
MMLU-Pro	87.5%	General knowledge
HLE	37.7%	Humanity's Last Exam

Third-party Artificial Analysis ranks V4-Pro-Max at 52 on its Intelligence Index, second among open-weights models behind Kimi K2.6 (54). On the GDPval-AA real-world agentic score, however, V4-Pro takes the open-weights crown at 1554, ahead of GLM-5.1 (1535) and Kimi K2.6 (1484). That dual reading, slightly behind on raw intelligence, ahead on agentic execution, matches the way DeepSeek positions the model: built for agents.

V4-Flash is no slouch either. The Flash model card lists 79.0% SWE-Bench Verified, 91.6% pass@1 on LiveCodeBench, 86.2% MMLU-Pro, and 88.1% GPQA Diamond. For a 13B-active model, those numbers are absurd.

Pricing: Cheapest Frontier-Class API

The DeepSeek platform pricing (per 1M tokens) is the other big story:

Model	Input (cache miss)	Input (cache hit)	Output
V4-Flash	$0.14	$0.0028	$0.28
V4-Pro (75% promo)	$0.435	$0.003625	$0.87

The 75% Pro promo extends through 2026-05-31 15:59 UTC. After that, expect roughly $1.74 input / $3.48 output per million tokens, still well below Claude Opus 4.7 territory. On Hacker News the cost-vs-Claude-Haiku-4.5 comparison was blunt: "3.3x cheaper input, 10x cheaper output." Third-party hosts (DeepInfra, Together, Fireworks, Novita, SiliconFlow) carry V4-Pro at $1.74 to $2.67 per blended million.

Run It Today

The fastest path is the hosted API. Switch your model name and the rest of your stack stays the same:

curl https://api.deepseek.com/v1/chat/completions \
  -H "Authorization: Bearer $DEEPSEEK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Refactor this code for 1M context."}]
  }'

For self-hosting, vLLM and SGLang both shipped Day-0 recipes with native CSA plus HCA, FP4 MoE backends, and MTP speculative decoding. NVIDIA NIM offers official Blackwell endpoints for V4-Pro, and OpenRouter load-balances across multiple providers if you want one API key for everything. Local inference works through the Hugging Face transformers path or community quantizations:

# Pull a community Flash GGUF (preview-stage, expect breakage)
ollama pull deepseek-v4-flash:q4_k_m

# Or test the official weights via transformers
huggingface-cli download deepseek-ai/DeepSeek-V4-Flash

The Flash card already lists 23 community quantization variants. The Pro card lists six, but no aggressive sub-Q4 GGUF has landed yet, the 860 to 900 GB native footprint plus FP4-QAT experts leave little headroom for further compression.

Where the Preview Tag Bites

This is officially a Preview, not a final release, and a few sharp edges show:

Hallucination rates on Artificial Analysis's AA-Omniscience evaluation come in at 94% for V4-Pro and 96% for V4-Flash, notably higher than peers. If your workflow is fact-heavy, ground the model with retrieval.
V4-Pro is consumer-unrunnable. Even at FP4 the model wants multi-node H100 or H200 servers. V4-Flash on a single H200 with quantization is the practical local play.
GGUF quality is unsettled. Community sub-Q4 quants of V4-Flash exist but quality is "unverified," and dynamic 1-2 bit V3-era tricks do not translate cleanly to V4's FP4-trained experts.
No Jinja chat template ships with the model, you need DeepSeek's Python encoder to build prompts correctly. Several day-one users tripped on this.
Frontier gap. Simon Willison's read of the technical report puts V4-Pro "marginally short of GPT-5.4 and Gemini-3.1-Pro," roughly three to six months behind the closed frontier on raw quality.

Why It Matters

For anyone building agents on open weights, DeepSeek-V4 is the new baseline to beat. It tops the open-weights GDPval-AA agentic leaderboard, it ships under MIT, it gives you a real 1M-token context with KV-cache numbers that make long-horizon agent loops actually affordable, and it does this at API pricing that undercuts every closed competitor. V4-Flash is the immediate winner for builders, fast, cheap, deployable, frontier-adjacent on code. V4-Pro is the model to benchmark your stack against for the rest of 2026.

If you have spent the past year working around context limits with chunking heuristics, retrieval shims, or expensive prompt-cache games, spend an afternoon throwing a 500K-token job at V4-Flash. The economics of long-context agents just changed.

Tested on: DeepSeek API (V4-Flash and V4-Pro Preview) | Date tested: 2026-05-01

Subscribe to the Newsletter

Search

GDPR Compliance