A System of Agents brings Service-as-Software to life READ MORE
04.26.2025 | By: Ashu Garg
In January, DeepSeek surprised the tech industry by releasing a model – R1 – that matched GPT-4-level performance at a fraction of the cost. The initial reaction was loud, but for anyone paying close attention, it shouldn’t have been.
The breakthroughs behind it – like model sparsity, memory optimization, algorithmic innovation, and multi-stage post-training – were already in motion. DeepSeek simply pushed them further and faster, proving that efficiency and SOTA performance could coexist much sooner than expected.
Clearly DeepSeek didn’t “change everything.” But it did confirm that building at the frontier isn’t just about scale anymore. Efficiency, strategic openness, product depth, pricing strategy, and domain expertise have all become just as important – and, in many cases, more.
This month, I’m stepping back to look at what’s actually shifted – not in the headlines, but on the ground, in what and how the founders I talk to are building, where funding is flowing, and what’s starting to separate the startups that succeed from the ones that fizzle out.
Here are five shifts I’m watching most closely.
DeepSeek’s launch catalyzed a new conversation around capital efficiency in AI. While headlines fixated on a “$6M” training run, that figure only captured the GPU costs for a single phase of training. The true cost of developing DeepSeek V3 – including R&D, model experimentation, and infrastructure – was far higher, with hardware investments alone estimated at over $500M.
Still, DeepSeek proved something important: with the right architectural choices and training strategies, it’s possible to replicate frontier-level capabilities with a fraction of the compute previously required.
In response, frontier labs are both scaling smarter and scaling bigger.
On one front, they’ve invested aggressively in optimization – tactical refinements like compressing existing models, distilling outputs into smaller footprints, and accelerating inference to improve performance without fundamentally altering their core approach. At the same time, they’re embracing deeper architectural and algorithmic innovations focused on efficiency. These strategies include mixture-of-experts architectures (which selectively activate only a fraction of a model’s parameters per query); post-training RL guided by synthetic reasoning traces; and advanced GPU orchestration techniques that extract significantly more throughput from existing compute resources.
At the same time, they’re ramping up to train the next wave of models using clusters with over 100,000 H100 GPUs, pushing toward training runs in the 3e26 FLOPs range. That’s roughly a 10× jump over GPT-4, which itself represented a 10× leap over its predecessor. Amazon, Microsoft, Alphabet, and Meta are projected to spend more than $320B on AI infrastructure in FY2025 alone. Microsoft reaffirmed its $80B commitment, while Amazon is channeling the bulk of its $100B+ CapEx into AI initiatives.
To sum up: efficiency and scale are not opposing strategies – they’re complementary. DeepSeek didn’t end the AI capex arms race – it made it smarter.
As with its early models, DeepSeek released R1 with open weights. Two months later, OpenAI announced it would release its own open-weight model for the first time since GPT-2 – a notable shift from its historically closed stance. Sam Altman admitted that staying closed “put us on the wrong side of history.”
What DeepSeek demonstrated – and OpenAI now appears to recognize – is that it’s no longer enough to monetize tightly-controlled APIs. Value increasingly flows from ecosystem gravity: an open model creates surface area for toolchains, fine-tuning frameworks, evaluation stacks, hosting services, and safety layers. Each forked repo and app built atop your model raises switching costs and strengthens your centrality in the developer ecosystem.
Indeed, as open models have rapidly caught up in performance, the rationale for keeping models closed has weakened. According to Stanford HAI’s 2025 report, the benchmark performance gap between top open-weight and closed-weight models has shrunk from 8% to just 1.7% over the past year. Open-weight models can now definitively deliver competitive performance at a fraction of the operating cost.
Meta played a key role in setting this expectation. By releasing its LLaMA models with open weights and permissive licenses, it normalized the idea that powerful base models should be portable, modifiable, and openly distributed.
In this context, OpenAI’s move toward openness isn’t just about broadening access – it’s a way to reassert narrative control over the open ecosystem. Its upcoming open-weight model will reportedly include an optional “cloud handoff” feature, allowing developers to run it locally for most tasks but escalate complex queries to OpenAI’s proprietary frontier models – thus tying openness back to OpenAI’s premium ecosystem.
Openness, done strategically, unlocks major advantages beyond ecosystem growth. Open-weight models can penetrate tightly regulated sectors – finance, healthcare, government – where self-hosting is critical. They provide a regulatory hedge by offering greater transparency, appealing to policymakers concerned with opaque AI systems. They also generate valuable telemetry, allowing labs to gather insights from diverse, real-world deployments.
Frontier labs will undoubtedly continue investing in advanced, closed models to push the limits of AI capabilities. Yet openness has become an essential complement. The cutting edge in AI is no longer defined solely by who builds the strongest models, but by who can build the broadest, most vibrant ecosystems around them.
While openness accelerates ecosystem growth, vertical integration is becoming key to ecosystem control. Frontier labs increasingly want to own not just the core models, but the workflows, experiences, and data those models power. To do that, they’re climbing the stack into applications, developer platforms, agent frameworks, and full-stack ecosystems.
OpenAI is a clear example. Once focused solely on APIs, it’s now building an end-to-end platform around its models. ChatGPT, once a simple conversational demo, has evolved into a richly featured product with memory, file analysis, tool use, and function calling. It’s also rolling out developer-facing tools with its Responses API, Agents SDK, and forthcoming low-code Agent Builder – all aimed at making OpenAI the go-to platform for building agent-based applications.
Perhaps the most telling sign of OpenAI’s ambitions is its reported $3B bid to acquire Windsurf, a fast-growing AI coding assistant embedded across thousands of enterprise teams. While still under negotiation, the move would give OpenAI direct distribution into developer workflows, along with telemetry on how users interact with code-generation models at scale.
By absorbing Windsurf, OpenAI wouldn’t just own the model; it would own the interface, the usage data, and the innovation cycles driven by user feedback. It’s a clear move to consolidate power at the application layer, where differentiation and defensibility will increasingly reside.
Similar dynamics are playing out across the industry. Anthropic is emphasizing interoperability and open standards with MCP, which standardizes how AI agents plug into external tools and databases. Meanwhile, Meta is advancing Llama Stack – a full-stack toolkit that turns its open-weight Llama models into production-ready solutions.
Though strategies differ, the underlying motives are the same. As foundational model capabilities converge, the labs that merely produce great models will cede ground to those that own the full stack.
Post-DeepSeek, it’s increasingly clear that how AI companies charge – not just what they build – is becoming a critical competitive advantage.
In the early stages of the current AI wave, most startups defaulted to usage-based pricing: per-token, per-message, or per-API-call. This approach was simple, familiar (like AWS), and easy to implement. But as inference costs plummet – and they’re plummeting quickly – usage-based pricing is becoming a race to the bottom. Moreover, customers don’t care about tokens or API calls. They care about outcomes and business value.
The smartest companies are adapting by moving away from consumption-based pricing and toward pricing that aligns more tightly with delivered value. A few approaches are now taking shape:
The further a company moves along this spectrum, the closer it gets to aligning price with delivered value rather than consumed resources – and the stickier the customer relationship becomes.
For founders, this means that pricing strategy is product strategy. It should be designed into the product from the beginning, not tacked on later. Early experimentation with workflow-based, outcome-based, or agent-based pricing may add complexity to the GTM motion, but it can dramatically strengthen retention, upsell potential, and customer trust over time.
In the consumer AI space, digital advertising is poised to become a key monetization strategy. Global digital advertising revenues surpassed $1 trillion in 2024, with digital channels making up nearly 75% of that total. As AI-powered search and chat platforms gain traction, they’re moving fast to capture a share of this massive market.
Perplexity, for instance, is building its own browser, Comet, to capture detailed user behavior beyond its core app – laying the foundation for hyper-personalized ad targeting. Incumbents like Google (Gemini) and Meta (MetaAI), with decades of advertising expertise, are similarly well-positioned to define what AI-specific ad monetization will look like. We’re seeing early experiments with sponsored answers and sponsored links embedded in chat. Longer term, the real prize is influencing the AI agents that will increasingly act on users’ behalf.
Ultimately, some of the biggest winners in this AI wave may not be those with the flashiest models, but those who figure out how to capture a fair share of the value they create. From now on, smart monetization will matter as much as smart technology.
The idea that one general-purpose model could rule them all is now behind us. The most compelling AI app companies today aren’t trying to match frontier labs on model scale. They’re out-executing them in specificity. Rather than spreading themselves thin, they go deep – becoming world-class at solving one hard, urgent, and valuable use case.
Importantly, specialization doesn’t mean small ambition. Some of today’s most promising “niche” AI companies are quietly tackling massive industries – legal services, logistics, insurance, medical documentation – spaces where early traction leads to durable moats and multi-product expansion. These aren’t edge cases, they’re beachheads. And startups that win them often find themselves expanding their TAM dramatically by growing from a position of earned trust and demonstrated value.
For founders, this means relentlessly focusing. Don’t aim for vague, horizontal categories like “AI for sales” or “AI for operations.” Pick one high-friction, high-value workflow, and solve it better than anyone else.
Take inspiration from companies like Tennr, which automates the messy administrative backbone of healthcare – managing referrals, verifying insurance eligibility, and coordinating clinical workflows that traditional EHR software barely touches. Or ConverzAI, which builds AI-powered virtual recruiters for staffing agencies, streamlining high-volume hiring processes that are too fragmented and operationally complex for big tech platforms to bother with.
In both cases, the magic isn’t just applying a model to a problem – it’s instrumenting the solution, building proprietary feedback loops, and creating data assets that compound over time.
Use foundation models when they fit – but don’t tether yourself to any single provider. Keep your architecture modular and adaptable, ready to swap in new engines as better models emerge. And don’t shy away from the places big tech avoids. Regulated industries, messy internal workflows, underserved verticals – these aren’t too small, they’re too nuanced. Which makes them exactly the places where startups can win.
The model infrastructure is in place, and the platform layer is becoming more mature and opinionated. But the critical last mile – translating AI’s capabilities into high-value products – is still wide open.
Published on April 25, 2025
Written by Ashu Garg