Mistral Forge
Build frontier-grade AI models trained on your proprietary data — no cloud lock-in
Video Review
About
Mistral Forge is an enterprise platform that lets organizations build custom AI models from their own data. Not fine-tune an existing model. Not plug into an API. Actually pre-train a foundation model on proprietary datasets. The platform bundles Mistral's own training recipes — the same ones used to build their flagship models — into a licensable product. It supports dense and mixture-of-experts (MoE) architectures, handles multimodal inputs (text, code, documents), and runs on the customer's GPU clusters. Mistral charges a license fee, not compute costs. What makes Forge different from fine-tuning services like OpenAI's or Google's Vertex AI: you're not tweaking an existing model's behavior. You're building a new model from scratch using data mixing strategies, pre-training, post-training, and RLHF — the full training pipeline that Mistral uses internally. The platform also comes with an unusual add-on: forward-deployed AI scientists. Mistral embeds researchers directly with customer teams to guide training runs, debug data pipelines, and optimize architectures. Think of it as a consulting engagement wrapped around a software license. Early customers include ASML (semiconductor), ESA (space), Ericsson (telecom), and several defense organizations. The common thread: industries where data can't leave the building, and generic models don't understand the domain. Pricing Mistral Forge operates on a license-based model. The platform license covers the training stack itself. Compute is BYO — you run it on your own GPU clusters, so Mistral doesn't charge for inference or training cycles. Optional add-ons include data pipeline services (custom data mixing and synthetic data generation) and forward-deployed AI scientists for hands-on support. All pricing is custom and requires contacting sales. Who Should Use It Forge is built for organizations with three things: proprietary data worth training on, GPU infrastructure to run training, and a use case where generic models fall short. If you're a startup fine-tuning GPT-4 on a few hundred examples, this isn't for you. If you're a defense contractor building classified language models, it probably is.
Key Features
- Pre-train custom foundation models on proprietary data using Mistral's battle-tested training recipes
- Full training pipeline: pre-training, post-training, and RLHF in one platform
- Supports dense and mixture-of-experts (MoE) architectures
- Multimodal input support for text, code, and documents
- Data mixing strategies and synthetic data generation pipelines
- Distributed computing optimizations for large-scale training runs
- Forward-deployed AI scientists who embed with customer teams
- Runs on customer's own GPU clusters — no data leaves your infrastructure
- Cloud-agnostic: works on AWS, Azure, GCP, or on-prem
Use Cases
- 1Defense and security organizations building classified AI models that can't touch third-party APIs
- 2Semiconductor companies (like ASML) training domain-specific models on proprietary chip design data
- 3Telecommunications providers building network optimization models on internal telemetry
- 4Space agencies creating specialized scientific models for mission-critical applications
- 5Healthcare organizations training HIPAA-compliant models on medical records
- 6Consulting firms building proprietary knowledge models from decades of internal reports
Pros
- Full pre-training capability — not just fine-tuning — gives much deeper model customization
- No cloud vendor lock-in: runs on any GPU infrastructure you own
- Includes Mistral's actual training recipes, not a watered-down version
- Forward-deployed scientists reduce the expertise gap for organizations new to model training
- License model means predictable costs (no per-token or per-GPU-hour surprises)
Cons
- Requires significant existing GPU infrastructure — not accessible to smaller teams
- Enterprise sales process with custom pricing makes it impossible to evaluate cost upfront
- No self-serve option: you need Mistral's sales team involved from day one
- Pre-training from scratch requires massive datasets — small data shops won't see value
- Still a new product (launched March 2026) — limited track record in production deployments
Details
- Category
- code
- Pricing
- enterprise