A System of Agents brings Service-as-Software to life READ MORE
12.06.2024 | By: Joanne Chen, Jaya Gupta
As the revolution transitions to evolution, the question also shifts from “What can AI do?” to “How will AI systems fundamentally transform work?” This was the line of inquiry that brought together more than 100 of the world’s top founders, researchers, and builders for Foundation Capital’s AI Unconference on November 7. Their goal: nothing short of figuring out how to build the future of AI.
The day’s discussions were built around four themes:
We’re sharing our top five insights from the day, chosen carefully from off-the-record discussions—without revealing too much. Stay tuned for forthcoming in-depth blogs on each theme.
AI agents are emerging as something of a backbone in next-generation intelligent systems. As they gain sophistication and autonomy, they also pose a number of problems: namely, how agents should choose appropriate tools for certain kinds of tasks, and how to guard against malicious influences. This is particularly true when it comes to what the creator of BabyAGI Yohei Nakajima calls “self-building” autonomous agents. In his talk at the AI Unconference, he outlined four levels of autonomy for such agents and has developed tools for building them through his project BabyAGI2o:
Level 0: Basic tools with fixed capabilities
Level 1: Request-based self-building (like an intern asking permission)
Level 2: Need-based self-building (like a junior employee making routine decisions)
Level 3: Anticipatory building (like a seasoned professional who prepares for future challenges)
Developing self-building agents safely and responsibly requires a strategic, incremental approach. By beginning with low-risk tasks such as web scraping, developers might progressively build system competence, then design in protective measures. As the agent demonstrates reliability, it can cautiously advance to more complex domains like financial decision-making, always maintaining strict limitations and human oversight. This scaffolded method is crucial for mitigating risks, particularly in complex environments where agents might be vulnerable to manipulation (like the internet) where an agent could be persuaded to use suboptimal corporate tools, a result of some ad or manipulation—much like an intern learning to navigate sales ploys or other challenges.
This idea comes from Stanford professor and AI Unconference speaker Azalia Mirhoseini, also a Senior Staff Scientist at Google DeepMind. In her July paper Large Language Monkeys: Inference Compute with Repeated Sampling, Dr. Mirhoseini argued that AI developers should move well beyond current one-pass evaluation techniques, never expecting an AI model to be correct on the first try. The focus should instead be on making models more effective in real-world applications by scaling inference methods, beefing up verifications, and allowing models to self-improve through data-driven refinement like repeated sampling. By increasing compute and repeatedly querying a model, she argues, it’s possible to increase the likelihood of obtaining the correct answer, even if the model’s initial response is incorrect.
Advanced AI systems have produced remarkable results, including complex language understanding and abstract reasoning; but the breakthroughs remain largely confined to the digital, not the physical, world. A notable crop of startups is making progress on creating AI that perceives, interacts with, and manipulates real-world environments.
We believe startups will thrive here, beating big companies to a “ChatGPT moment” for physical AI. Big tech companies often operate on six-month performance cycles and projects require layers of approval—which limits innovation. While access to compute and data remains a challenge for startups, smaller operations can assemble the types of cross-functional teams that physical AI requires—including roboticists, AI experts, and sensor or hardware designers. These teams can focus narrowly on problems, prototyping solutions rapidly. Promising areas of innovation in physical AI include workplace safety and industrial applications (already an area of interest for AI Unconference attendees Perceptron, which recently emerged from stealth); and fall detection among the elderly (a specialty area for our portfolio company SafelyYou.)
Cybersecurity teams are at a crossroads: they are severely understaffed (by approximately 4.8 million people, at the latest count). This coincides with rapidly proliferating technologies and software, which require complex, nuanced security decisions and deep domain knowledge and understanding. This pressure-cooker environment forces security professionals to make unfortunate trade-offs between rapid deployment and more thorough security reviews—often while using outdated software.
Dimitry Shvartsman, Co-Founder and Chief Product Officer of Prime Security, explained at the AI Unconference that while tools like Static Application Security Testing (SAST) provide partial solutions, AI will prove a critical way for cybersecurity teams to keep up. The most promising uses of AI for cybersecurity: catching coding errors, assisting with code reviews, generating test cases, summarizing technical documentation, detecting threats and anomalies, and predictive vulnerability management, among others. Overall, AI will become a force multiplier to help overwhelmed teams scale their capabilities.
2024 was a watershed year for multimodal AI. OpenAI, Google, and Anthropic all rolled out multimodal models. New capabilities for integrating images, video, and audio material into AI move it closer to human-like cognition and bring the physical world closer to the digital one. And yet we end the year with plenty of ambiguity about how to train multimodal AI. Unlike the relatively straightforward evaluation of language AI—where tests can cleanly assess correct answers and reasoning capabilities—evaluating multimodal AI is still a bit unclear.
Vision technologies have mostly been about mastering object detection. Will multimodal AI tech be graded on understanding that certain objects or actions are more important to pay attention to than others? Should it understand a whole scene, or several of them? Evaluating multimodal AI creates something of a chicken-and-egg dilemma, where the measurement tools and the standards they’re meant to assess are mutually dependent.
Stay tuned for more insights as we explore these and other themes from AI Unconference discussions.
Published on December 6, 2024
Written by Joanne Chen