The paradox of self-building agents: teaching AI to teach itself

Now that AI is gaining more autonomy, how do we prevent it from making serious mistakes?

Ideas / Perspective / The paradox of self-building agents: teaching AI to teach itself

#AI#generative ai#Enterprise#Agents#RPA

01.09.2025 | By: Joanne Chen

VC Yohei Nakajima, the creator of BabyAGI, with Foundation Capital General Partner Joanne Chen at the AI Unconference.

AI is evolving from a tool that responds to a tool that acts, and agents are paving the way. An agentic system is one that takes a goal, breaks it down into steps, executes the steps while passing results forward, then combines results to achieve the goal. Foundation Capital spent much of 2024 exploring possibilities around agents, positing the view that a System of Agents is a way for builders to capture the sweeping shift from software-as-a-service to service-as-software. As AI begins to eat into four decades of technological advances, we believe agents will systematically collapse enterprise software stacks. By the end of 2024 tech leaders also recognized the transformative power of agents, including executives at Salesforce, Hubspot, and Microsoft, who predicted that agents will be the new apps in an AI-powered world.

The intern problem

After a period of steady progress developing agents, AI builders now find themselves at a crossroads. Because agents can be not only self-directed but also self-building, there is incredible opportunity if they are designed correctly. Human workers want a new crop of AI-powered agent-assistants to handle a huge range of complex tasks; we must also make sure agents operate within safe, appropriate boundaries. Such a challenge mirrors a familiar workplace scenario: How do we give an intern enough autonomy to be effective, while also preventing them from making costly or critical errors?

In a talk at Foundation Capital’s AI Unconference last November, Yohei Nakajima, creator of BabyAGI and, more recently, Pippin the Unicorn, outlined the unique problems posed by self-building autonomous agents.

Agents’ four levels of autonomy

First, Yohei identified four levels of autonomy for such agents (he’s laid out instructions for building them through his project BabyAGI2o):

Level 0: Basic tools with fixed capabilities

Level 1: Request-based self-building (like an intern asking permission)

Level 2: Need-based self-building (like a junior employee making routine decisions)

Level 3: Anticipatory building (like a seasoned professional who prepares for future challenges)

Most AI tools today operate at Level 0, which means they rely on a fixed set of developer-provided tools and cannot acquire new ones upon user request. This is not to say these solutions aren’t valuable—consider ChatGPT with predefined plugins or a Slack AI assistant with pre-built summarization and writing features. In these cases, the AI simply checks its existing function library when responding to a user query, then applies the relevant tool. Because there’s no mechanism to expand or modify that toolset, this level offers no self-building capability.

Level 1 marks the point where an agent can generate new tools, but only when a user explicitly requests them. Imagine asking a Level 1 agent to complete a task that its existing library doesn’t cover—it will analyze the requirement, identify the gap, and build a new function to address it. For instance, if you need a parser for a specialized data format, you instruct the AI to create one. Once built, that parser becomes part of the agent’s expanding toolkit for future use. Still, at this stage, the agent doesn’t anticipate your needs; it only creates new tools upon user command.

At Level 2, the agent starts building tools on an as-needed basis—without waiting for an explicit user request to create a new function. When a user submits a query, the AI first checks if an existing tool can handle the task; if not, it automatically generates a suitable tool and executes it. For example, if the data arrives in a unique format, the agent builds a parser on the spot, then uses it to fulfill the user’s query. Over time, this collection of tools grows organically, though the agent is still reacting to immediate needs rather than predicting future ones.

At Level 3, the agent doesn’t just respond to current requests—it anticipates what users might need next and proactively evolves itself. Beyond creating new tools, it can rework its own architecture or algorithms to stay ahead of shifting requirements. For instance, if it expects future data to arrive via streaming APIs instead of static uploads, it can develop the necessary modules in advance. By simulating potential scenarios, the agent refines both its toolset and its operational strategies, ensuring it’s primed to handle new formats or novel challenges before you even ask.

Rogue agents create real risks

The potential hazards of advanced self-building are many. A Level 3 agent might create a slew of powerful—but inappropriate—tools and begin using them. A large corporation could task an agent with “maximizing company efficiency,” letting it create tools for monitoring and curtailing department spending; but the agent might also start monitoring employee behavior, compromising privacy and trust. An agent might be tasked with optimizing supply chain resources, building an algorithm for doing so; but it could easily systematically bypass smaller vendors, harming a company’s partnerships or reputation. Imagine a self-building personal AI agent that surfs the web and books your travel—all with access to your credit card. Such an agent could be vulnerable to manipulation by travel websites if it optimized solely for lowest listed prices or fastest routes, or it could fall prey to deceptive pricing schemes or manipulated search results.

Yohei Nakajima’s answer to such problems: don’t develop self-building agents with comprehensive pre-built capabilities, but rather through carefully constrained systems that earn trust gradually—much like how organizations develop human talent. For example: a developer might start training self-building agents with low-risk tasks like web scraping in order to build system competence. As the agent demonstrates reliability, it could cautiously advance to more complex domains like financial decision-making, all the while maintaining strict limitations and human oversight.

Just as interns need guidance to avoid bad vendors or scams, self-building agents need protection from being “tricked” into using suboptimal or malicious services. Increasingly sophisticated agent training should include not only technical capabilities but also the judgment and context awareness that humans develop through experience: learning to recognize red flags, validate information sources, and balance competing priorities before making decisions. Safety measures are crucial. For financial operations, agents should have spending caps and multi-factor authentication rules to prevent costly mistakes. Frequent human feedback can guide self-building agents toward preferred actions. These guardrails aren’t just about preventing errors; they’re about the steady building of trust.

Agents abound, but need training wheels

We’re on the cusp of AI agents becoming ubiquitous—integrated into workflows, managing our schedules, and handling our daily tasks. Salesforce has already deployed Einstein GPT agents across its 150,000-company customer base, and Microsoft has rolled out Copilot agents with varying degrees of autonomy to 1.4 billion Windows users. Enterprise strategies around agent-building are also changing. In Menlo Ventures’ annual survey of AI in the enterprise, 2024’s biggest breakthrough was agentic architectures, which now power 12% of implementations. Another noticeable year-over-year shift: around half of enterprise companies are building their own gen AI software, down from 80% using third-party vendors last year, meaning companies are increasingly confident about developing internal AI tools.

As agents become a standard business tool, getting their self-building capabilities right isn’t simply a technical challenge; it’s an organizational imperative. The right balance between capability and control could mean the difference between agents that generate real ROI vs. those that replicate security vulnerabilities and fill the world with bad code. Just as we wouldn’t let a medical resident perform surgery unsupervised, we can’t release autonomous, self-building agents into the business world without training wheels. The future of AI isn’t just about what agents can do; it’s about how thoughtfully we teach them to teach themselves.

Building the future with agents? We’d love to hear from you. Email: jchen@foundationcap.com.

Published on January 9, 2024.
Written by Foundation Capital

The paradox of self-building agents: teaching AI to teach itself

The intern problem

Agents’ four levels of autonomy

Rogue agents create real risks

Agents abound, but need training wheels

Related Stories

Beyond RPA: How LLMs are ushering in a new era of intelligent process automation

AI leads a service-as-software paradigm shift

The Future of Generative AI Agents

Get insights directly to your inbox