Summarize this content to 2000 words in 6 paragraphs
Madrona partner Jon Turow. (Madrona Photo)
Editor’s note: This post first appeared on Jon Turow’s Substack newsletter.
The AI community is rightfully buzzing about the new model DeepSeek R1 and is racing to digest what it means.
Created by DeepSeek, a Chinese AI startup that emerged from the High-Flyer hedge fund, their flagship model shows performance comparable to models in OpenAI’s o1 series on key reasoning benchmarks, while their distilled 7B model arguably outperforms larger open-source models.
But beyond the immediate excitement about democratization and performance, DeepSeek hints at something more profound: a new path for domain experts to create powerful specialized models with modest resources.
This breakthrough has three major implications for our industry. Yes, application developers get powerful new open-source models to build on. And yes, major labs will likely use these efficiency innovations to push even larger models further.
But most intriguingly, DeepSeek’s approach suggests how deep domain expertise might matter more than raw compute in building the next generation of AI models and intelligent applications.
Beyond Raw Compute: The Rise of Smart Training
What makes DeepSeek R1 particularly interesting is how it achieved strong reasoning capabilities. Instead of relying on expensive human-labeled datasets or massive compute, the team focused on two key innovations:
First, they generated training data that could be automatically verified — focusing on domains like mathematics where correctness is unambiguous. Second, they developed highly efficient reward functions that could identify which new training examples would actually improve the model, avoiding wasted compute on redundant data.
The results are telling: On the AIME 2024 mathematics benchmark, DeepSeek R1-Zero achieves 71.0% accuracy, compared to o1-0912’s 74.4%. More impressively, their distilled 7B model reaches 55.5% accuracy — surpassing the 50.0% achieved by QwQ-32B-Preview despite having far fewer parameters. Even their 1.5B parameter model achieves a remarkable 28.9% on AIME and 83.9% on MATH, showing how focused training can achieve strong results in specific domains with modest compute.
A Gift to Application Developers
The immediate impact of DeepSeek’s work is clear: their open-source release of six smaller models — ranging from 1.5B to 70B parameters — gives application developers powerful new options for building on top of capable reasoning models. Their distilled 14B model in particular, outperforming larger open-source alternatives on key benchmarks, provides an attractive foundation for developers who want to focus purely on application development without diving into model training.
Accelerating the Leaders
For major AI labs, DeepSeek’s innovations in training efficiency won’t slow the race for bigger models — they’ll accelerate it. These techniques will likely be used multiplicatively with massive compute resources, pushing the boundaries of general-purpose models even further. The compute race at the top will continue, just with better fuel.
A New Path for Domain Experts
But the most interesting implications may be for teams with deep domain expertise. The industry narrative has largely suggested that startups should focus on building applications on top of existing models rather than creating their own. DeepSeek shows there’s another way: applying deep domain expertise to create highly optimized, specialized models at a fraction of the usual cost.
It’s telling that DeepSeek emerged from High-Flyer, a hedge fund where the reward function is crystal clear — financial returns. It’s reasonable to imagine they’re already applying these techniques to financial modeling, where automated verification of predictions against market data could drive highly efficient training.
This pattern could extend to any domain with clear success metrics. Consider teams with deep expertise in:
Code generation, using application performance, commit histories, and verification/testing for feedback
Financial modeling, using market data for verification
Medical diagnostics, with clinical outcomes as ground truth
Legal analysis, where case outcomes provide verification
Industrial operations, where real-world performance data creates feedback loops
With DeepSeek’s techniques, such teams could:
Generate synthetic training data that can be automatically verified against domain rules
Create reward functions that efficiently identify high-value training examples
Focus compute resources on the specific capabilities that matter most for their domain
Vertically integrate specialized models with domain-specific applications
The power of this approach is evident in DeepSeek’s distillation results. Their 32B parameter model achieves 72.6% accuracy on AIME 2024 and 94.3% on MATH-500, significantly outperforming previous open-source models. This demonstrates how focused training can overcome raw parameter count.
The Future of Model Development
Looking ahead, we’re likely to see model development stratify into three tracks:
Application developers building on increasingly powerful open-source foundations
Major labs using efficiency techniques to push general-purpose models further
Domain experts creating highly optimized, specialized models with modest compute budgets
This third track — domain experts building their own models — is the most intriguing. It suggests a future where the most interesting AI developments might come not from who has the most compute, but from who can most effectively combine domain expertise with clever training techniques.
We’re entering an era where smart training may matter more than raw compute — at least for those wise enough to focus on the right problems. DeepSeek has shown one path forward. Others will follow, but with their own domain-specific twists on these fundamental innovations.