DeepSeek V4
Open-weight MoE model family with a 1M context window, MIT license, and frontier benchmarks at 14% of Claude's price.
About
DeepSeek shipped V4-Pro and V4-Flash on April 24, 2026. Open weights. MIT license. SWE-bench score within 0.2 points of Claude Opus 4.6. Output tokens at $3.48 per million versus Anthropic's $25.That pricing is not a discount. It is a category break. And it happened on Huawei chips, not Nvidia.What You Are Actually GettingV4-Pro is a 1.6 trillion parameter Mixture-of-Experts model with 49 billion active parameters per token. It was pre-trained on 33 trillion tokens and ships with a 1,048,576-token context window. Benchmarks: 80.6% on SWE-bench Verified, 67.9% on Terminal-Bench 2.0 (higher than Claude Opus 4.6), 93.5% on LiveCodeBench, and a Codeforces rating of 3,206 that puts it in the top fraction of 1% of competitive programmers.V4-Flash is the smaller, faster sibling at $0.28 per million output tokens. It is the model you put on the autocomplete path, the embedding-generation path, and any agent loop where you are generating millions of tokens a day.Both ship with open weights on Hugging Face under MIT license. You can download them, run them, fine-tune them, and serve them commercially with no royalty owed. That is the first time a frontier-tier model has shipped with a real open-source license.The Pricing Math That Breaks Cost ModelsRun a coding agent that generates 10 million output tokens per day. On Claude Opus 4.6 you pay $250/day. On GPT-5.5 you pay $300/day. On DeepSeek V4-Pro you pay $34.80/day. Over a year, per agent, that is a $78,000-$97,000 delta.The closed-frontier counter-argument is reliability, support, data residency, and real-world task completion where benchmarks underestimate quality. All of that has truth. None of it is 14x the price.Where It Falls ShortTwo honest caveats. First, the DeepSeek API routes through Chinese infrastructure. If you serve EU customers under GDPR or enterprise customers under SOC 2, the hosted API may not clear your compliance review. Self-hosting the open weights solves that, but only if you have serious GPU capacity.Second, the 1M context window is real for the first 150K-200K tokens and marketing after that. Every 1M-context model drops below 70% needle accuracy past 200K and loses the middle 40% of prompts past 500K. Treat the extra headroom as a buffer, not a product feature.How to Use It TodayEasiest path: point your existing OpenAI-compatible SDK at DeepSeek's API endpoint and swap the model name. The API surface is drop-in compatible. You can have V4-Pro serving production traffic in an hour.Harder path: self-host the weights from Hugging Face. You will need 8x H100s or equivalent for V4-Pro inference. V4-Flash is manageable on a 4x H100 node.Smartest path: put V4-Flash on your high-volume tier, keep Claude Opus 4.6 or GPT-5.5 on your critical path, and route between them with a simple cost-vs-quality policy.VerdictDeepSeek V4 is the most important open-weight model release since Llama 3. Benchmarks tie Claude. Pricing is 14% of Claude. License is MIT. If you ship AI products, this belongs in your stack by next Friday — at minimum on the high-volume tier. The frontier is not closed anymore.Related ResourcesArticle: DeepSeek just open-sourced a Claude-tier model — the full pricing and benchmark breakdown.Article: GPT-5.5 just shipped — the closed-frontier launch DeepSeek V4 just undercut by 86%.Repo: HKUDS Nanobot — 21K-star Python agent framework you can wire to DeepSeek V4 in under 10 lines of config.MCP server: Vercel Next.js DevTools MCP — pair V4's agentic mode with real Next.js runtime telemetry.
Key Features
- V4-Pro: 1.6 trillion total parameters, 49 billion active per token via Mixture-of-Experts
- V4-Flash: smaller sibling optimized for high-throughput agentic loops at $0.28 per million tokens
- 1,048,576-token context window (Pro and Flash both)
- Open weights on Hugging Face under MIT license — self-host, fine-tune, and serve commercially
- Trained on 33 trillion tokens, entirely on Huawei Ascend chips (no Nvidia)
- 80.6% SWE-bench Verified, 67.9% Terminal-Bench 2.0, 93.5% LiveCodeBench, Codeforces rating 3,206
- OpenAI-compatible API surface — drop-in replacement at the SDK level
- Native function calling and structured JSON output built into both Pro and Flash
Use Cases
- 1Teams hitting six-figure monthly API bills on Claude or GPT-5.5 who need a price-parity frontier model
- 2Agentic coding stacks (Cursor, Cline, Claude Code) that want to swap in DeepSeek V4 on the high-volume tier
- 3Self-hosted RAG and chatbot deployments where open weights + commercial license matter for compliance
- 4High-throughput autocomplete, embedding, and first-pass summarization work that lands on V4-Flash
- 5Research labs evaluating open-weight MoE architectures against closed-frontier baselines
Pros
- Ties Claude Opus 4.6 on SWE-bench Verified (80.6% vs 80.8%) and beats it on Terminal-Bench and LiveCodeBench
- Output tokens cost 14% of Claude Opus 4.6 and 12% of GPT-5.5 — the single biggest frontier-pricing cut of 2026
- Open weights under MIT license means no per-seat fees, no rate limits, and real self-host freedom
- 1M context window matches Gemini 3.1 Pro and doubles Claude Opus 4.7's default mode
- Huawei Ascend training proves frontier-grade AI does not require Nvidia — multi-vendor future starts here
Cons
- DeepSeek's hosted API routes through Chinese infrastructure — may not clear GDPR, SOC 2, or HIPAA reviews
- 1M context is real up to ~200K tokens — needle accuracy drops below 70% and lost-in-the-middle effect kicks in past 500K
- Self-hosting a 1.6T-parameter MoE model requires serious GPU infrastructure (8x H100 class minimum)
- Slightly behind Claude on long-horizon planning and multi-turn reasoning in early community real-world tests
- Documentation and support ecosystem is thinner than Anthropic or OpenAI — expect to read code, not tickets
Get Started
Details
- Category
- other
- Pricing
- Freemium — open weig