A System of Agents brings Service-as-Software to life READ MORE
12.27.2024 | By: Ashu Garg
For me, the story of 2024 in technology can be summed up in a single number: 1000x.
That’s the factor by which the cost of machine intelligence has fallen in just three years – from $60 per million tokens with GPT-3 in 2021 to $0.06 with Meta’s Llama 3.2. To my knowledge, this represents the most rapid democratization of any technological capability in human history. Intelligence, once humanity’s most precious and scarce resource, is becoming ubiquitous, abundant, and essentially free.
This development has rekindled a techno-optimism that many thought had been extinguished by the 2022 downturn. The markets have certainly taken notice: AI now comprises roughly half of the S&P 500’s market cap, its influence reaching far beyond the Mag Seven into sectors as diverse as industrials and utilities.
What’s even more remarkable is how quickly we’ve normalized AI’s expanding capabilities. Tasks that would have seemed impossible two years ago – sophisticated reasoning, end-to-end task completion, saturation of our most advanced benchmarks – now barely merit mention. The frontier for what counts as “AI” keeps advancing, even as our understanding of how to achieve it evolves (more on that below).
As someone who has spent decades investing in and advocating for Silicon Valley’s founders, it’s been particularly vindicating to watch the region defy a decade of premature obituaries. The “death of Silicon Valley” narrative has proven spectacularly wrong. Instead, the Valley has emerged stronger than ever as AI’s global nerve center, with a concentration of leading AI labs and startups that I expect to intensify in 2025.
Throughout 2024, my team and I have both chronicled and anticipated many of these shifts – from the evolution of “software as a service” to “service as software,” the emergence of “compound AI systems” and “systems of agents,” and the rise of alternatives to transformer-based models.
As we look ahead to 2025, here are ten developments that I see coming:
As I chronicled last month, the assumed progress of scaled pretraining has hit three walls: data, compute/energy, and model architectures. But, in 2025, these walls won’t limit AI’s advance – they’ll redirect it toward new frontiers.
One of the most promising frontiers is “reasoning” – where models don’t just recall patterns from training but actively work through problems during inference. Take OpenAI’s o3 model: rather than producing instant answers, it generates detailed reasoning paths tailored to each task, much like a mathematician methodically working through a proof.
The reported results are striking: o3 has achieved 87.5% on the ARC-AGI prize and 25% on FrontierMath (a specialized math test written by Fields Medalists, where previous models peaked at 2%). To put this leap in perspective: it took four years for performance on ARC-AGI to inch from 0% with GPT-3 to 5% with GPT-4. According to François Chollet, the prize’s founder and veteran AI researcher, o3 represents a fundamental breakthrough in AI’s capacity to handle novel situations.
This inference strategy comes at a cost: o3’s top-performing version demands 172x more compute than its baseline, costing over $3,400 per answer. But if the past three years have taught us anything, it’s that these costs tend to plummet. The convergence of more efficient training and sophisticated reasoning suggests that AI progress in 2025 may accelerate beyond even this year’s impressive gains.
This evolution in how we think about test-time compute brings me to my next point: the future belongs not to those with the biggest models, but to those who architect the most performant AI systems.
An AI model in isolation is just bits on a disk. Even the simplest output requires at least three components to work in concert: a prompt, a sampling method to generate outputs, and a verification strategy to evaluate results. What we perceive as “intelligence” and “reasoning” emerge from the thoughtful orchestration of these elements with external tools and APIs. When we marvel at a model like o3’s ability to solve problems, we’re really watching a careful choreography among multiple specialized components: one to generate possible solutions, another to verify them, and others to refine and improve the results.
While the last four years were defined by the race for scale, 2025 will be shaped by researchers and builders who master this systems-level architecture. The breakthroughs we’ll see won’t come simply from training larger models, but from finding more elegant and effective ways to combine multiple, smaller models and software components. This move from “model-centric” to “system-centric” thinking will start to erode incumbents’ capital advantages and benefit startups who can move quickly and experiment.
2025 will see AI companies break free from traditional software budgets as they target the vastly larger services market – a roughly 10x expansion in TAM. They’ll succeed by selling actual work completion rather than just workflow enablement.
This shift to outcomes-based pricing presents a classic innovator’s dilemma for incumbents. Their revenue models, sales incentives, and GTM strategies are optimized for selling seats and licenses. This opens a significant opportunity for startups building business models native to AI’s capabilities.
Meanwhile, AI is upending software’s core assumption that marginal costs approach zero at scale. Currently, each step toward higher model performance demands exponentially more resources. A chatbot that’s right 90% of the time might cost $10 per user, but achieving 99.9% accuracy could justify $1,000 per user given the underlying compute costs.
We’re already seeing this pricing structure emerge with OpenAI’s latest tiers, which reach $200/month for its pro plan, with discussions of a $2000/month tier for business users. While these figures might seem steep compared to the initial $20/month offering, they’re modest when measured against human expertise.
Looking ahead, as models like o3 push toward extended reasoning times of “hours, days, even weeks,” the subscription model itself may become obsolete – creating another advantage for AI-native startups over incumbents wedded to traditional pricing models.
The story of AI hardware in 2024 was largely the story of NVIDIA – their near-monopoly on AI chips drove the company to a $3.3T valuation. But 2025 will write a different narrative, driven by mounting competition and a shift in how AI systems consume computational resources.
The challenge that cemented NVIDIA’s dominance – pretraining – is fundamentally a throughput problem. It requires massive clusters of chips running at full capacity for months, processing enormous batches of data in parallel. NVIDIA excelled by building an integrated stack of hardware and software optimized for these concentrated, predictable workloads. Yet inference presents a different set of challenges: workloads are spiky and unpredictable, latency matters more than raw throughput, and computation needs to happen at the edge, rather than in centralized data centers.
The AI infrastructure landscape of 2025 will likely become more distributed and heterogeneous, optimized for different tradeoffs than today’s massive GPU farms. While NVIDIA isn’t standing still, there’s a significant opening for competitors – both from custom silicon designed by tech giants (Apple, AMD, Microsoft, Meta, Google, Amazon, and Tesla are all contenders) and from innovative startups. The question isn’t whether NVIDIA remains a major player (they certainly will), but whether they can maintain their near-monopolistic position.
2025 will witness the rise of a new generation of enterprise software giants. These won’t be traditional systems with AI features bolted on, but AI-native platforms that reimagine how software works. Enterprise AI software spending has already jumped to $4.6B (up from $600M in 2023) – and this is just the beginning as our “service as software” paradigm takes hold.
Consider what’s happening in CRM, traditionally one of enterprise software’s most entrenched markets. Today’s systems of record – Salesforce, Hubspot, etc. – were built around structured representations of data in text-based formats. An AI-native sales platform doesn’t just add features to this aging model: it reimagines the core system as a multimodal brain that processes and acts on text, image, voice, and video.
Incumbents’ distribution moats – often an insurmountable barrier for startups – matter less when the underlying technology shift is this profound. Sales teams aren’t adopting AI-native platforms because they’re incrementally better, but because they eliminate entire categories of work – from lead research to call preparation to collateral creation.
While cloud and mobile each produced about twenty startups with $1B+ revenues, those companies had to find narrow vertical niches to compete. AI’s advances now enable startups to launch frontal attacks on nearly every major category of enterprise software – from sales and marketing automation to ERP and financial planning. Any form of legacy software that’s anchored on a structured-data, text-dominant paradigm could become obsolete.
The opportunity to rebuild these products from the ground up with AI-native architectures represents one of the largest value creation opportunities in enterprise software history.
By the end of 2025, the simple chatbox that defined early AI products will feel as dated as the command line. We’ll see specialized UIs emerge for different kinds of work: interactive dashboards for monitoring AI processes, visual tools that make AI reasoning transparent and debuggable, and intuitive interfaces for creative collaboration. These new interfaces will acknowledge that AI isn’t just a back-and-forth question-answering system, but a complex tool we need to guide, monitor, and collaborate with in more advanced ways.
Early signs of this evolution are already here. Anthropic’s Artifacts and OpenAI’s Canvas recognize AI outputs as starting points for iteration rather than final products, while Google’s NotebookLM offers multimodal interaction that seamlessly blends text and voice.
As model capabilities converge, the key differentiator may become UX – not surface-level design choices, but in the deeper sense of how effectively humans can partner with AI and harness its capabilities. The winners in this space won’t just build powerful models; they’ll build interfaces and experiences that make AI’s power accessible and controllable.
Google’s famous “10 blue links” have defined how we access information online and shaped the architecture of the modern web. But in 2025, we’ll witness the beginning of the end for this decades-old paradigm, as AI-native information access renders traditional search results obsolete.
The shift is already underway. Platforms like Perplexity and ChatGPT (which recently added web search) demonstrate how direct, synthesized answers are superior to scrolling through ad-laden link lists. More importantly, they’re training a new generation of internet users who instinctively “chat” rather than “Google” their questions.
Meta’s potential entry into search could hasten Google’s fall. Their social graph offers something Google’s index lacks: real-time understanding of how information flows through human networks. By one estimate, Meta has access to 100x more data than what’s on the public internet – provided they can navigate the complex compliance requirements of using it. A Meta search product could combine traditional web content, social signals, and AI synthesis in ways that make Google’s current offering feel static and disconnected.
Google’s ad revenue is the economic engine that built the modern web. But protecting this revenue while simultaneously shipping its replacement may prove impossible. The DOJ’s antitrust scrutiny, including discussions of forced search index licensing, further complicates Google’s ability to leverage existing advantages in this new AI era.
The history of technology is littered with pioneers – Yahoo, Netscape, MySpace… – who catalyzed revolutions but failed to capture their value. In 2025, OpenAI may join their ranks. Despite the impressive technical achievements of models like o3, their recent $157B valuation increasingly looks like it prices in permanent dominance of a market that’s becoming more competitive by the day.
Google’s Gemini has already surpassed GPT-4 on key industry benchmarks, while Meta’s open-source strategy delivers comparable capabilities at half the cost. When Llama 3 powers free AI features across Facebook, Instagram, and WhatsApp that reach 4B users, ChatGPT’s 10M paying users starts to look less like market dominance and more like a vulnerable early lead.
Open source progress is equally striking – these models now match their closed counterparts on nearly every benchmark that matters. To give just one data point, Llama 3.1 405B sits just a hair behind Claude 3.5 Sonnet and GPT-4 Turbo on MMLU.
Enterprise spending patterns back up the evals. Data from Ramp shows that OpenAI’s share of AI spend among customers on their platform has dropped from 90% to 76% this year. Businesses are adopting a multi-model strategy and building infrastructure to easily switch between providers. Excellence in model development alone, it turns out, doesn’t create customer lock-in.
Other dynamics to watch in 2025: OpenAI’s planned restructuring as a for-profit company and its relationship with Microsoft. Per their current agreement, Microsoft has full rights to OpenAI’s IP until AGI is achieved. This creates fascinating incentive structures around if and when OpenAI might declare they’ve reached this milestone – especially given the company’s stated mission of pursuing AGI above all else.
2025 will see Meta’s Llama architecture become for AI what Linux became for servers – the standard that defines how AI systems are built and deployed. Building on Llama enables developers to tap into an entire ecosystem that’s being optimized around it, from hardware and devtools to training and deployment pipelines.
This marks a break with the AI development paradigm of 2022-2023, when entering the model space meant raising hundreds of millions just for initial training runs. The question wasn’t “What innovative approach can we take?” but “Can we access a 100,000-GPU cluster?” In 2025, with continued advances in open source and model distillation, small teams will increasingly compete against the bigger labs – particularly for specific vertical and “last mile” use cases where specialized knowledge matters more than scale.
In 2025, self-driving cars will transform public trust in AI by making machine intelligence visible and undeniable. While we can debate a chatbot’s abilities, there’s no arguing with an AI that navigates the road more safely than humans. This everyday, physical proof of AI’s societal benefits will do more to win public confidence in this technology than any model breakthrough.
As of mid-2024, Waymo’s robotaxis have logged over 22 million miles of autonomous driving, including 5.9 million in San Francisco, where their white Jaguar SUVs have become fixtures of the urban landscape. Last weekend, I watched one navigate a crowded parking lot – a scenario that gives even the most experienced human drivers pause.
Seeing our driverless future continue to come to fruition – with all its positive implications for safety, accessibility, urban design, human productivity, and overall quality of life – is one of the developments I’m most looking forward to in 2025.
2024 saw the AI ecosystem become much friendlier to startups – with open models matching closed ones, small models making rapid gains, inference strategies mattering more than raw scale, and enterprises embracing AI-native solutions. 2025 will see this trend accelerate, as systems overtake models, interfaces evolve beyond chatboxes, and startups launch frontal attacks on industry giants.
The deeper we venture into this AI revolution, the less certain I’ve become about what’s possible and what isn’t. That’s precisely what makes this moment so exciting – the limits lie less about what AI can do than about what builders can imagine doing with it. We’re moving from an era defined by technical barriers to one constrained only by human creativity and ambition.
For those of us who’ve spent our careers supporting early-stage founders, there’s never been a more exciting time to be in tech.
Onward to 2025 ✨
Published on December 27, 2024
Written by Ashu Garg