Episode 61

The case for context graphs

With Aaron Levie (Co-founder & CEO, Box)

02.20.2026 | By: Ashu Garg

Aaron Levie has been on the podcast twice before. After we published our context graphs thesis, he wrote a response, so we invited him on to continue the conversation.

A context graph is institutional memory for how an organization actually makes decisions: not how the process doc says it should, but how it works in practice. Enterprise software is very good at recording outcomes – the final price, the approved discount, the escalated ticket – but not the reasoning behind them. Which exceptions applied? What precedent mattered? Who approved what, and why?

We call these missing records decision traces. Over time, they accumulate into a context graph: a living, queryable map of how an enterprise actually makes decisions, stitched across systems and time so precedent becomes searchable. We think the companies that capture that layer will define the next generation of enterprise software.

Aaron read the piece and joined us to push it further. We get into how the services as software opportunity unfolds as agents scale, and what it actually takes to move them out of the sandbox and into production.

What we cover:

00:00 Intro: Aaron’s third time on the podcast!
00:43 What is a context graph?
01:35 Aaron’s take: this won’t be zero-sum
04:10 Why systems of record may become more valuable in a world with 100x more agents
05:26 The difference between data and context
06:28 The moat incumbents have: workflow wiring, permissions, access controls
10:35 Which functions are most vulnerable to disruption
15:43 The trillion-dollar greenfield: high headcount, exception-heavy workflows
16:20 Ops as a wedge: RevOps, DevOps, SecOps, and glue work between systems
19:27 How PlayerZero is building context graphs for real engineering workflows
20:48 Is permissioned inference possible?
21:27 “Agents can’t keep secrets”: why access controls are so important
27:45 What Box is building
29:33 Multimodal context: screenshots, audio, and video
32:50 Why vibe coding won’t change the software industry’s power-law
35:03 Aaron’s advice to founders

Read the transcript:

Ashu: Aaron, thank you so much for being on the B2BaCEO podcast. Jaya and I are excited to have you for the third time.

Aaron: I hear it’s a record for the podcast.

Ashu: You’re popular, what can we say!

Aaron: Or you just ran out of guests. Maybe it’s a sign of where enterprise software is.

Ashu: Well, given what’s happened to the markets this week.

Aaron: There’s nobody left to do your podcast.

Ashu: My portfolio is looking really bad. At what point do I need to start selling my Atherton house?

Aaron: Your post was a little too powerful. Maybe you should have timed the market a little differently.

Ashu: Maybe you can start by describing what a context graph is and why it matters, and we’ll take it from there.

Jaya: For sure. If you take a step back, 2025 was supposed to be the year of AI agents. The models did get better, but in enterprises, agents still don’t act the way you’d expect them to.

The reason is pretty simple. Agents can read data and take action, but they still don’t know why decisions get made. That reasoning is what we call a decision trace. Those decision traces are scattered across tools, buried in Slack, and sometimes they don’t get recorded at all.

The winners of the future will be the companies that can capture those decision traces and turn them into context graphs.

Ashu: Aaron, you’ve been blogging prolifically from the early days on AI, and more recently on context graphs, so I’d love to get your take.

Aaron: I thought your post was great. Very provocative. I didn’t agree with 100% of it, but the core notion felt spot on.

Agents only work when they have context. That context often doesn’t live in the outcomes. It lives in how people made a decision or the guidelines they follow to make future decisions. Many systems of record contain output information, but they don’t contain everything that was in people’s heads that went into the decision.

That seemed like a core part of the thesis, and I totally agree with it. I think there was also a strong conclusion that this will require an entirely new set of companies for all use cases, and as venture capitalists, I would only expect you to make that claim.

Ashu: It’s a little self-serving.

Aaron: It’s totally appropriate for that to be the emphasis. I do think this is a disruptive concept that will cause many incumbents to fail and miss the window.

At the same time, there are categories where the workflow a company is involved in is a natural launchpad for capturing context. So it becomes a race across software: who can get the best context first, per category, per job function, per line of business, per workflow.

It’s hard for me to say generically who wins and loses from first principles, because a lot comes down to execution. How good is their tech stack? How well can they upgrade their products in this way?

But I also think this is one of those markets where you could bet on every incumbent and still have the market size double for new startups, because categories will get much larger and many new categories will exist.

You’ll need context graphs for all kinds of workflows, and that will create the next set of $10B, $50B, and $100B software companies.

Ashu: I totally agree. If we go back to our work on services-as-software, and you and I have talked about this before, in established categories there’s a 5x to 25x expansion just from automating human activity.

At the same time, I want to come back to your point on incumbent systems of record and get a point of view from both you and Jaya.

We see incumbents struggling to incorporate these new data types because you need event stream data, not just outcomes. That event stream has to be stateful because sequence matters.

Each event has to be associated with data and metadata around documents and conversations. When a pricing decision was made, it might have happened on a Zoom call or in an email. You have to associate the state of the decision with the underlying documentation.

That kind of data fabric does not exist. For us, that’s the context graph: the ability to stitch all of those things together.

Jaya: Aaron, you made a really good point that there’s a correlation between a software company’s moat and the amount of data they house.

It made me think: is it the volume of data or the type of data that matters for agents? And is there a difference between data and context?

Incumbents have the most data. Salesforce has 20 or 25 years of customer data. SAP has decades of transaction data. Workday has all the information about onboarding and everything else.

Maybe this is self-serving, but what’s more important is the why behind the what.

That reasoning lives outside systems of record today. If organizational reasoning becomes the most valuable asset for agents, then where do the incumbents end up? Salesforce owns Slack, and a lot of reasoning happens there, but I don’t have a strong take yet on the others.

Aaron: I don’t know if your piece was written this way, but the internet deals in binary outcomes, so people saw it as zero-sum, like this clearly means the end of systems of record and traditional platforms.

I’ve been doing Box for 20 years and watching software for about 26. It’s more boring, but the outcome is usually more nuanced and more positive-sum.

With on-prem versus cloud, almost every on-prem enterprise software company from the 90s, unless it got acquired or shut down, is bigger today.

Oracle is a $500 billion company. SAP is a $300 billion company. Cloud companies were disruptive, served new customers who could never have bought enterprise software, and solved new use cases. At the same time, the enterprise software players kept evolving and modernizing.

The analogy isn’t perfect, but on context graphs, I agree: not all data is treated equally for generating useful context for agents.

If I’m an enterprise IT buyer and I’ve already implemented Workday for HR data, with my org chart and my understanding of the business built into it, the question becomes: how hard is it for Workday to add the incremental context an agent needs, versus how hard is it for a new company to emerge and recreate all the data Workday already has?

Each category will have its own race like that. In some cases it will be a hybrid: a new agent pulls data from Workday. It’s in Workday’s interest to provide that data because it reinforces the system’s value.

In that case, some value from the context graph shows up in a new startup, but the purpose and value of the system of record doesn’t really go down. That’s the hybrid state we’ll see in a lot of categories.

My general mental model is: if you had 100x more agents in an enterprise, would those agents reinforce the value of core systems of record, or reduce it?

If I’m running mission-critical transactions on an ERP system powering my supply chain, I can’t get anything wrong. A hundred times more agents means the system that traffic-cops those agents becomes more valuable. Guardrails, what agents can access, and what they can execute within the supply chain all matter more.

I haven’t found many categories where that doesn’t happen. The X factor is whether incumbents respond. Will they build real, useful agents in their platforms, or at least provide powerful APIs without closing them off so much that it becomes economically impossible for external systems to use them?

If those two things are true, I think the marginal value of a CRM or HR system goes up. The value of the core system increases because of what’s being executed on top of it.

Ashu: I also think there are functions where multiple systems of record are still in play.

In most companies, you have one system of record for finance and associated finance data. You might use SAP if you’re large, or NetSuite if you’re an emerging business, but you’re usually standardized on one.

Replacing that is visceral for CFOs. They’ll say, “I hate it, but I’m not going to replace it.” Those systems are in decent shape, though their relative value will change depending on how companies respond.

Two functions are different. In go-to-market systems, despite Salesforce’s dominance in core CRM, most companies use half a dozen tools: CRM, demand gen, a sales engagement platform, customer support tickets. Very few standardize on one platform.

That’s partly because each team prefers best-of-breed, and partly because it’s a newer category.

HR is similar. Core HRIS can be different from your applicant tracking system, which can be different from performance management. Those are the three big systems in HR. Fewer than go-to-market, but still multiple systems.

Both functions will see more change because incumbents don’t have all the data and don’t stitch it together.

They’re also functions where more data is unstructured, and incumbent systems don’t capture it by definition. It will vary function by function, and it depends on how incumbents react.

Salesforce, ServiceNow, and Workday are remarkably strong companies. I wouldn’t ignore them for a second. I was buying more Salesforce stock yesterday, even as I say this.

Aaron: Yeah. Maybe one or two other incumbent advantages, and then maybe it’s worth switching to the greenfield areas, because for what it’s worth I think that will be a trillion dollars of new opportunity.

I’m a little biased because this is what we see all day from the Box side. Once you’ve wired up your organization with things like access controls and permissions and who has access to what data and what workflows they’re involved in, that ends up also being a bit of a moat. It’s correlated to the workflow piece, but people underestimate how many years of organizational wiring have now been incorporated into software.

The challenge with agents, as we all know, is they would love to tell you the answer to anything the moment you ask. Unless the agent knows what you have access to and can only do things on your behalf that you’re actually allowed to execute on, you have a massive security challenge.

That’s why to some extent you see agents using these systems as tools, because it’s a natural proxy for what the user has access to. It’s a very convenient relationship. The agent can only do things in the tools using the permissions I already have in that tool, which ensures that you maintain the same security and access control levels you previously had.

If you have a bunch of new agents that don’t know what the permission levels are because you’re trying to recreate a new graph outside of the existing systems, you end up with a huge uphill battle of how to give the agent all the appropriate knowledge about any given workflow if it’s no longer in the system that already has that workflow built out.

So that’s another incumbent advantage where that dynamic is relevant.

Ashu: I think you’re spot on. There’s a class of companies in communication and collaboration. Box for sure. Microsoft. Google with their suite. Okta, which doesn’t do communication and collaboration but is often the platform for access control.

These companies all have a very unique strategic position in the market, and how powerful that is over the next decade will depend on how it evolves. But that’s not true for all incumbents, and that’s part of why the story will be mixed.

Jaya, I want to come back to you. What are one or two spaces you’re most excited about? Where do you believe a large or disproportionate chunk of the trillion-dollar opportunity will be captured?

Jaya: I think it will probably be in a few different places, and you have to come up with creative wedges.

One is high-headcount workflows. Where you see 50 people doing a process manually, it’s because the decision logic is too complex to automate with traditional tooling. There’s too much judgment and too many exceptions. That’s one signal.

Another is exception-heavy decisions. Places like deal desk, underwriting, compliance reviews, and escalation management.

The third is the glue functions. You mentioned this earlier. RevOps, DevOps, SecOps, FinOps. If you put ops behind it, those roles exist because there is no single system of record. They’re the glue between two or three different systems and the bridge between different functions.

Today their function is carrying context that software doesn’t capture. Those are interesting places. They have a bad rap because software hasn’t been sold successfully to them. One of the challenges will be the buyer. Will the CRO get excited about it if it’s RevOps? How do startups push through to the buyers? There will be challenges, but there are interesting wedges.

Aaron: I really like the ops point you made in the piece.

From a software standpoint, this is exactly where the agent opportunity creates new markets. The total headcount in X Ops is usually two or three people. If you were trying to sell marketing ops software before, there were only a couple of seats. You could never underwrite building a large software company for that.

But if ops becomes unlimited agents, you can get deployed because you’re bringing real capacity to the work.

That’s a huge area where you’ll see a lot happen. There’s no incumbent. It never worked as a software category before because the TAM was too small. Now you have entirely new use cases to automate.

On the 50-headcount point, Paul Graham tweeted something yesterday that fits my thinking. Anywhere there is a large body of things you need to process or produce is a good candidate.

There are many workflows constrained by how much data people can understand, interpret, review, analyze, and turn into a useful decision. Previously we could store and log the data, but we didn’t do anything with it. Agents now let you do something with it for the first time.

I don’t know exactly how this market plays out, but code review is a good example. There wasn’t a software category called code review, and now there are three, four, or five companies.

Jaya: PlayerZero is one of our companies. They’re building a production engineer and solving three big problems at once.

One is tackling support engineering workflows, tier-three technical support issues. They’re doing QA and thinking about AI SRE workflows.

Ashu: What’s interesting about that example is their context graph combines code, support tickets, and observability data. It takes three functions that have been independent in an organization and collapses them.

Aaron: So the question is how many PlayerZero markets there are where you need to bring together three or five different functions and a new type of player is the only way to do that. That’s a classic innovator’s dilemma.

For the most part, I think of an agent as an extension of what your current worker does that gives you 100x capacity.

But there are use cases where an agent should do something that was 10% of 30 people’s jobs and connect them all together. There’s no incumbent that owns that data stream because no human did. So there’s nobody who has the access controls or permissions to do it.

Jaya: I have a question because we’re bringing up access controls a lot.

In regulated industries like law firms, banks, and healthcare, there are strict conflicts. With document retrieval it’s easy to filter at query time based on permissions.

But as agents learn patterns across documents, it’s harder. If an agent learns something from a document a user can’t access, can the agent still apply what it learned when answering that user’s question?

Have you seen anything around permissioned inference?

Aaron: Yeah. So this is what makes my life much more pragmatic than if I were a 19-year-old entrepreneur again. I wouldn’t even have to think about any of this stuff. But now I’m this old man and I think about access controls a lot.

Ashu: Old is relative, Aaron.

Aaron: All right, fine. Compared to maybe Jaya, I don’t know how old you are. But I don’t know of a good way, and I’m sure somebody has a research paper out there, where an agent can know something and then ensure that it never exposes that to somebody else.

With any degree of prompt injection and prompt engineering, you can almost always extract anything out of the context window. So our general thesis at Box is: agents can’t keep secrets.

There’s nothing you can give an agent that it will ever be able to fully keep track of in its context window that it’s not supposed to share with somebody else. Which basically means you have to rely on deterministic systems to control access levels.

Again, somebody out there probably has amazing research with some kind of inference permissioning model. I haven’t seen it. I’m sure somebody’s working on it. Maybe you know something, please tell me right now and we’ll go look into it. But it’s not something we’ve run into that would be practical at scale.

Ashu: It’s a very hard problem to solve. A couple of different companies actually deal with this problem, where they handle it at the context-layer level. They determine what aspects of the graph an agent can access, using good old-fashioned access control, but applied to the graph.

Aaron: Yeah, that totally makes sense. The gotcha is why this matters.

How do you give the access control to the right people? It’s great that there are APIs for all these things, and it’s great that agents can do that, but as an end user I still need to give something to my lawyer. I still need to give something to my investment banker.

That’s why you’re still probably going to rely on systems you’re familiar with. I still want to expose that folder, and I’m biased, in a Box environment, or open something up in a Slack channel.

You don’t really extend permissions and access controls to people through lots of systems all throughout your day. You rely on three or five or ten different things because your brain explodes. There are too many permutations to manage for all the people you’re working with.

Unless you have a way where a human can give access to somebody else, and an agent can also access that thing, you need that complete system if you’re going to deliver on the graph that will drive the automation.

One of the untold reasons why AI coding has taken off with zero friction is you almost never worry about access controls. If you’re an engineer on a team, you get access to everything.

Ashu: You get access to everything. You’re always over-provisioned.

Aaron: Exactly. So we’ve been in this free-lunch zone of AI automation for the past couple of years, which is: I have an agent look at some code and write more of it.

I can’t do that if I’m on a team working with contracts. I can’t do that if I’m an investment banker working on deals. I can’t do that if I’m in life sciences working on an FDA drug trial with clinical information that has PII. I don’t have that free lunch that the engineer has.

This is the real world that’s going to cause incumbents to catch up, or have enough time, or just cause some spaces to look a little different than what we’ve seen in the first era of AI agents.

Ashu: And they will absolutely look different. I want to come back, Aaron, to if you had to pick one or two areas where you think startups have a real advantage or opportunity, which would those be?

Aaron: The areas I get most excited about when looking at startups are where there is work that everybody would agree needs more of it done, but a company can rarely hire enough people to do it, or they’re bottlenecked by that work.

It’s an imprecise framework, but I think about it even as we build Box. Where are the places where if I could hire 10 or 50 more people in that area, that would be great? It would only be net positive. We’re constrained by some resource allocation process that makes that very hard to do.

Those are usually the areas where you’ll see a lot of tailwind because companies will say, for the first time, for a tenth of the price of a person, I can now do more of that work.

And it’s not replacing anybody because we could not hire enough of those people previously. That’s where some of the biggest opportunities are. Companies already have budgets ready to go. It’s a relief to them that agents can now do those things. That’s tended to be my investment pattern on the startup side.

We’ll see which spaces play out. The only other related point is: where are things that people could never have done? Even if you had hired enough people, it wouldn’t have been possible.

I’m sure PlayerZero is doing this, but the idea is: you can’t throw enough bodies at SRE work for the one weird remote event that happens that takes down your site.

I tend to like those spaces. Use AI as a new form of abundance to solve new problems. I’m a little less excited by just replacing a 10-person team and making it a 7-person team.

Ashu: Aaron, I want to transition to some of the things you’re doing at Box. You’re clearly a very successful incumbent, and at the same time you’ve been very open about reinventing the company in this new AI world. Are there one or two things you can share?

Aaron: The slightly snarky thing I wanted to do when you guys did that post is I wanted to quote-tweet it and say, this is what we’re building at Box.

I didn’t want to create a massive flame war with everybody else who was probably going to quote-tweet that.

We’re obsessed with the idea of context. You’re going to have superintelligence in the form of a generic model, and the only way it can be useful is if you give it context. The only way it’s useful for your workflow is if it knows about your business.

Every single thing in that piece, I fully agreed with. Then it becomes: who can give the agent context?

What we get excited about is that a lot of that context does exist in corporate information already. Not all of it, but 80% to 90% of the key things people need to know exist in unstructured data.

It’s things like: who do I talk to for this problem? What’s the timeline on this roadmap? How do I have an agent answer an RFP?

So it becomes a big problem: how do I give the right information to the agent and not have it get confused by the wrong data or out-of-date information? How does it work on the task effectively? We’re building a lot of capabilities to solve that problem.

Jaya: I’m curious how Box thinks about multimodal data. A lot of the data associated with reasoning is screenshots, diagrams, video calls, recordings, conversations. How do you think about that in the new AI world?

Aaron: That data is becoming vastly more valuable because now we can process it at scale.

You can crack open an image. You can listen to an audio transcript. You can process a video. That can factor into the same context as text-based data.

We think of it as another data type. We don’t treat it differently than text data. It’s more unstructured data that agents can now understand. Then it’s up to the customer how they want to use it to make an agent better or automate a workflow.

Ashu: Makes sense. Historically, data in Box has never had a time component or a sequence component. How does that play out in what you’re doing, or in what you’re seeing other companies do?

Aaron: We aren’t doing anything novel on that front. We have all of that data as exhaust from the system. Any customer could leverage that information using our API. If any context-graph agent company wants to use that data, by all means. We’ll pump it to you and you can learn from whatever that workflow is.

To me, the jury is still out on how often you will systematically look at that data, and for what use cases, versus cases where you want somebody to take an hour and write out the best practice for something.

For responding to system incidents across a trillion events on your platform, you’re going to look at system data and rely on it.

For best practices on discounting in different industries, I still think there’s a human-in-the-loop element where you want someone to apply judgment and say, yeah, this is what we do.

In that case, it might end up as a Markdown file in a folder an agent has access to, and that helps your workflow, as opposed to relying on every event that’s ever happened across time.

Different workflows will have different patterns in terms of what data informs the agent.

Ashu: Aaron, I know we’re running into time constraints. I’m going to do two or three quick lightning-round questions. What is the one assumption about the future of software that you feel most certain about, at a time when everything is changing?

Aaron: The thing I feel most certain about is that the power-law dynamics of software don’t change as a result of vibe coding.

You’ll have classic market-share dynamics in every category of software. The top player will have 70% of the market. The second will have 20%. The third will have 6%. The fourth will have 2%, and so on.

The idea that you take an average enterprise in the same industry, ask them what their HR system or ERP system or CRM system or email provider is, and they say, we built it ourselves, or we’re all using different vendors because there are 100x more vendors that vibe-coded them, I don’t believe that outcome.

It’s a misunderstanding of why companies buy software and where the real cost of running software is. It’s in the running of it. It’s in the maintaining of it. It’s in the support.

Ashu: It’s in the training people to use it. It’s the extent to which you can rely on it. Absolutely spot on.

Aaron: Those things don’t go away.

Ashu: Next question. Context graphs is a trend you’ve been watching closely, and participating in actively. What is the other major thematic trend you’re watching most closely?

Aaron: The context stuff is taking 90% of my headspace at the moment.

I think about the future of agents as a race to who can provide them with the best context. That will be the defining characteristic of the winners and losers of AI over the next decade.

Ashu: It sounds like context graphs will be the most important thing in your life in 2026.

Aaron: I think it will be. The sister to context graphs is context length and context rot. Can we make sure the model can actually make use of all the context we’re giving it, versus not being able to execute on that data? Those are things I’m thinking about.

Ashu: Last question. What advice would you give to a founder starting a new company today?

Aaron: Back to where I would bet on opportunities, I would go after spaces that are not the AI version of the incumbent.

I would build AI products where there is no incumbent. Go after categories where there was no good software because it wasn’t a real software market before. It was things people did manually. Now agents can do them for the first time and you can sell them to the people doing them manually.

The people doing it manually will get 10x greater output. That pattern will work for hundreds of markets over the next few years.

Ashu: Thank you so much, Aaron, for joining us today.

Published on January 16, 2025
Written By Ashu Garg

The case for context graphs

What we cover:

Read the transcript:

Related Stories

AI’s trillion-dollar opportunity: Context graphs

Where AI is headed in 2026

How to build a multi-billion-dollar software business

Get insights directly to your inbox