05.24.2024 | By: Joanne Chen
In this post, I share learnings from my conversation with Chi Wang, a principal researcher at Microsoft and the creator of AutoGen.
Agents have been a cornerstone of human-computer interaction for decades, from the friendly Clippy of Microsoft Office fame to auto-suggestions in Google Docs and NPCs in video games. While these early agents hinted at the potential for personalized, goal-oriented interactions, they were limited in their ability to handle higher-level tasks. It’s only with the recent advent of LLMs that the true potential of agents has begun to be realized.
As LLM-powered agents have moved from research experiments into production, they’ve enabled increasingly sophisticated applications for both consumers and enterprises. But even the most advanced standalone agents still struggle with multi-step tasks that require navigating different contexts and managing dependencies.
This is where multi-agent systems come in. By breaking down complex problems into discrete subtasks that are handled by specialized agents, these systems offer a modular, flexible, and resilient approach to automating tasks that were previously considered beyond software’s reach. Leading multi-agent frameworks like Microsoft’s open-source AutoGen are currently powering a wide range of academic and enterprise use cases, including synthetic data generation, code generation, and pharmaceutical data science.
To better understand multi-agent systems—both their potential and their present-day limitations—I spoke with Chi Wang, a principal researcher at Microsoft and the creator of AutoGen. In this post, I’ll share some of my key learnings from our conversation.
ICYMI: This essay is a part of my new series, AI in the Real World, where I have in-depth conversations with leading AI researchers about how state-of-the art AI is being applied in enterprises. Check out our previous conversations here.
Building reliable standalone AI agents is an open challenge. So why introduce more agents into the equation?
To answer this question, it’s helpful to go back to the origins of multi-agent cognition, which can be traced back to Marvin Minsky’s classic 1986 book, The Society of Mind. Here, Minsky proposed that human cognition arises from the interaction of numerous simple “agents”—simple entities designed to perform certain functions, such as recognizing a shape or processing emotions. He posited that by combining these agents in specific ways (into networks or “societies”), intelligent behavior could arise—a phenomenon he termed the “Society of Mind.” Minsky’s key insight was that thousands of modular minds working in concert could outperform a single monolithic mind.
Today’s multi-agent systems, with their abilities to learn, adapt, and coordinate, are the direct descendants of Minsky’s vision. By training groups of agents to collaborate and compete in pursuit of shared goals, developers can create systems that dramatically exceed the capabilities of any single agent: the same “1 + 1 = 3” effect that Minsky saw as central to human cognition.
As Chi explains, multi-agent systems offer three main benefits:
Distributing complex tasks across specialized agents makes the overall system more modular. This modularity simplifies development, testing, and maintenance, as capabilities can be added or tweaked without revamping the entire system. Troubleshooting is also streamlined, as issues can often be isolated to individual agents.
Think of multi-agent systems as teams of experts, each contributing unique knowledge and abilities to collectively tackle difficult problems. Tasks are broken down into components and assigned to the agent best equipped to handle them. As each agent processes its part of the task and passes information to the next, the output is progressively refined and improved. Through such specialization, the resulting systems can achieve results that generalist agents struggle to match.
This approach is conceptually similar to techniques like prompt chaining, where a human user breaks down an intricate task into a series of subtasks and iterates toward a desired outcome through conversation with the model.
Chi offers the example of a multi-agent system tasked with analyzing data and providing insights and recommendations. In this scenario, each agent focuses on a different aspect of the task: some specialize in data retrieval and presentation, others in deep analysis and insight generation, and others in planning and decision-making. This division of labor allows each agent to work on what it does best, leading to faster, more accurate outcomes.
In multi-agent systems, the interactions among individual agents can give rise to solutions that exceed what any single agent could achieve in isolation. By allowing agents to work together, critique one another, and share their insights, the system can develop a more comprehensive understanding of the problem at hand. This is especially valuable when dealing with complex, multifaceted issues that no single agent has the breadth of knowledge or skills to fully address.
The beauty of collaborative learning lies in its ability to generate creative solutions that might elude a more homogeneous system. As agents converse and build on each other’s ideas, they can explore a wider range of possibilities and uncover approaches that individual agents might overlook. These synergies are the key to unlocking the full potential of multi-agent systems. As inference techniques improve, such inter-agent exchanges will only become faster and more efficient.
To illustrate this concept, Chi describes a multi-agent framework with one GPT-4 agent and several GPT-3.5 agents. In this setup, the GPT-4 agent serves as an expert “teacher” or “mentor” to the GPT-3.5 “students.” By engaging with their more advanced peer, the GPT-3.5 agents can quickly master specific tasks without the need for extensive individual training. As each agent improves through this collaborative learning process, the system’s overall capabilities grow.
How can builders best design applications using multi-agent systems? Chi shares some helpful insights.
Choosing the right architecture is critical, as multi-agent systems introduce myriad complexities around coordination, consistency, and coherence that single-agent setups avoid. For straightforward, narrowly defined tasks, a lone agent may be the simpler, more efficient choice. Factors such as response speed, decision-making frequency, inter-agent communication needs, latency, and bandwidth all influence the decision between single and multi-agent architectures.
Start simple, then scale. By deploying one or two agents initially and incrementally scaling up, developers can validate the core design and interaction patterns before introducing additional complexity. This approach also streamlines debugging and optimization, as issues can be more easily traced back to individual agents.
In multi-agent systems, specialization breeds strength. Developers should adopt a divide-and-conquer approach, allowing each agent to focus on its area of expertise. This goes beyond simple prompt engineering: agents can be equipped with task-specific resources and tools, such as access to databases and specialized software, along with clearly defined rules and constraints that guide them toward desired outcomes. Effective design involves mapping out the subtasks required to achieve the overall objective, understanding their interdependencies, and assigning agents accordingly based on their specialties and the system’s evolving needs.
Seamless communication between agents is crucial, and both static and dynamic topologies have their merits. In static setups, the communication channels linking agents are predefined and unchanging. This approach prioritizes simplicity and predictability, making the system easier to understand, analyze and debug.
Dynamic topologies, by contrast, allow agents to create and modify communication links on the fly, thus enabling them to adapt to shifting circumstances and requirements. Imagine a disaster response scenario where agents represent different emergency services. Within a dynamic topology, these agents can fluidly connect and coordinate based on real-time data like incident locations and resource needs. This adaptability enables the system to mount a more effective and targeted response to evolving crisis conditions—yet it also makes analyzing and overseeing the system more difficult.
Striking the right balance between agent autonomy and control is an ongoing challenge. Too little autonomy can result in a rigid, limited system, while too much autonomy may lead to unstable or unexpected behaviors. Adjustable autonomy, which allows for dynamic, context-dependent changes in the level of control exerted over agents, is an active area of research.
Most multi-agent systems involve human users to some degree—which means that innovative interaction design is essential. Agents need effective mechanisms for conveying relevant information to human stakeholders, soliciting input and direction as needed, and modify their behaviors in response to feedback.
A primary design consideration is whether to present the multi-agent system to users as a unified, monolithic entity or as a collection of distinct, interacting agents. In the former case, users might interact with the system through a single interface, regardless of the number and diversity of agents operating behind the scenes. In the latter, users would need to communicate with multiple agents individually, potentially using different interfaces and interaction patterns for each.
Emerging HCI paradigms are exploring a range of possibilities for human-agent collaboration. Some envision multi-agent systems as sophisticated but essentially passive tools for executing well-defined tasks under human direction. Others treat agents as proactive collaborators: dynamic, autonomous partners that can engage in creative problem-solving alongside their human users.
Because multi-agent systems are modular, their individual components can be isolated, evaluated, and optimized, allowing developers to continuously refine the system’s performance. To support this process, Chi encourages builders to implement mechanisms for monitoring agent performance, identifying issues, and iterating on system design. One approach is to use dedicated agents whose sole purpose is to evaluate and benchmark the performance of other agents in the system. These specialized agents can analyze operational data (like logs), extract relevant evaluation criteria, and automatically score the performance of other agents.
Multi-agent systems present distinct safety and security challenges. The high degree of interdependence between agents means that failures or vulnerabilities in one part of the system can quickly cascade.
One common failure mode arises from conflicts between the “world models” of different agents: the core assumptions, beliefs, and representations that each agent relies on to understand its environment and objectives. If these world models fall out of sync, the system can become unstable as agents start to work at cross-purposes. A multi-agent retail forecasting system, for example, could be compromised if one agent assumes rising demand while another expects a decrease, leading to faulty inventory decisions.
The distributed structure of multi-agent systems also expands the attack surface for malicious actors. Each agent is a potential point of entry that could be breached and exploited to manipulate the broader system, using its highly interconnected nature to rapidly propagate an attack. A hacked agent could be used to feed false data to its peers, skewing their world models and triggering destructive feedback loops. Imagine a swarm of autonomous drones that is suddenly fed contradictory location data by a corrupted agent, causing them to collide in mid-air.
To defend against these threats, multi-agent systems need robust security measures at both the individual agent and network levels. Techniques like multi-factor authentication, end-to-end encryption, and hardware-based trusted execution environments can help harden agents against intrusion. Anomaly detection systems can be used to identify suspicious behavior patterns that might indicate an ongoing attack.
Multi-agent systems hold immense promise for enabling more sophisticated, capable AI applications. As this field continues to advance, researchers are focusing on several key areas to more fully realize the potential of this exciting paradigm.
Fortunately, active research efforts are already making headway in this crucial domain. Techniques like multi-agent debate, which pits agents against each other to stress-test ideas and surface potential flaws, and recursive reward modeling, which refines agent objectives through iterative human feedback, are showing promising results.
As researchers like Chi continue to push the boundaries of what’s possible with multi-agent AI, it’s clear that we’re only beginning to scratch the surface of what this technology can achieve. From automating complex tasks to tackling multifaceted problems that have long confounded traditional software approaches, the applications for multi-agent systems are vast.
Stay tuned for upcoming installments of “AI in the Real World,” where we’ll continue to explore the cutting edge of generative AI, including alternative model architectures, inference strategies, and software-hardware co-design.
Published on May 24, 2024
Written by Foundation Capital