A System of Agents brings Service-as-Software to life READ MORE

 Celebrating the launch of dltHub: AI-powered data pipelines for the software 3.0 era

Ideas / Points of View / Celebrating the launch of dltHub: AI-powered data pipelines for the software 3.0 era

07.16.2025 | By: Steve Vassallo


The story of software is one of layered recursions. First, we wrote explicit code: line-by-line instructions for computers (software 1.0). Next, with the rise of deep learning, we trained neural networks to learn how to achieve objectives directly from data (software 2.0). Today, we use natural-language prompts to instruct AI systems to write code for us, which becomes the basis for further rounds of AI-generated prompts and code (software 3.0). Each step forward represents a new level of abstraction that brings human intent closer to machine execution.

Until now data engineering has seen little benefits from previous software revolutions, but software 3.0 is changing that. For years, data teams have been burdened by repetitive, tedious tasks: connecting to APIs, handling authentication, mapping JSON to tables, and maintaining fragile ETL pipelines, to name just a few. Today, these tasks can be described in natural language and delegated to AI systems.

The shifts are creating major tailwinds for our portfolio company dltHub. dltHub is an open-source product that’s purpose-built for the software 3.0 era. Designed from the start for both AI and human users, dltHub transforms the messy, complex world of data engineering into an accessible, streamlined, and collaborative human-AI workflow.

As a lead investor in dltHub’s seed round, I’ve watched dlt scale dramatically: from 87 companies in production in December 2023 to over 4,000 today, spanning startup unicorns and Fortune 500 firms. Just last month, dlt was downloaded more than 2 million times on PyPI, making it the most popular Python library for moving data and one of the fastest-growing data tooling projects I’ve ever seen.

Even more compelling is the momentum from LLMs and the rise of “vibe coding”: a new way of programming in natural language (concurrent with software 3.0) that’s allowing the long tail of niche data sources to become a shared catalog of pipelines. This June alone, more than 40,000 dlt sources were created by users, demonstrating the power of AI-assisted development at scale.

Two data engineers on a mission

I first met Matthaus Krzykowski, dltHub’s CEO, through my longtime friend and fellow founder Lars Kamp. When we reconnected years later around dlt, Matthaus and Marcin, his co-founder and CTO, had lived the frustrations of building machine-learning data pipelines firsthand for customers of AI agent startup Rasa. They understood deeply the friction points in modern data engineering, especially for fast-scaling, Python-centric teams.

Their hard-earned insights, combined with my own experience as an early investor in several foundational data infrastructure startups including Tabular, MotherDuck, and Mode, set the stage for a data-nerdy reunion at dlt’s Berlin headquarters in late 2023. The more I dug in with Matthaus and Marcin, the more impressed I was with their technical depth, clarity of vision, and genuine passion for democratizing powerful data tools. By the end of our day together, we signed a term sheet to lead their seed round.

Pairing AI with community

dltHub began with the mission of enabling data moving for Python-first data teams. As AI’s capabilities have advanced, their Python-native approach has positioned them perfectly for our software 3.0 world, allowing them to evolve from solving data movement challenges to reimagining how data infrastructure is created, shared, and managed.

With dltHub, you might start by telling an AI agent, “I need to load data from the Stripe API into BigQuery,” and within moments receive a working pipeline script that you can run or refine. This human-AI partnership reflects both the present and the future of programming, and dltHub is among the earliest developer tools to fully embrace this paradigm.

This human-AI collaboration becomes even more powerful when paired with community. Join dltHub Slack or browse their GitHub, and you’ll find a buzz of activity around building new data connectors. Every day, developers use LLMs (via tools like Cursor, GitHub Copilot, and Claude Code) to spin up pipelines for new APIs and data sources.

This community-driven approach makes dltHub more than a tool: it’s true platform and ecosystem. Just as GitHub became the home for code and open-source collaboration, dltHub aims to be the home for data connectors and pipelines: a centralized knowledge database where you can search for any data source and find a pre-built pipeline created by a fellow community member, often with AI assistance.

Why we invested: Pythonic, AI-powered, and beloved by developers

At Foundation, we believe the next generation of developer tools will be AI-native and community-driven. dltHub embodies this vision. Our investment centers on several key theses:

  • An exceptional team with deep domain expertise. Matthaus and Marcin have deep roots in data engineering and AI. They anticipated the Python + AI convergence and built the right product at the right time. Under their leadership, dltHub has grown from an idea in 2021 into a thriving community. Their passion for solving real developer pain points was evident from our first meeting: this is a team on a mission to modernize data infrastructure.
  • Python-first as a strategic advantage. Python is the lingua franca of data and AI. dltHub’s decision to go all-in on Python means it plugs directly into the workflows of millions of developers. Their impressive growth over the past year underscores how many Python developers were craving exactly this solution.
  • AI-native workflows from day one. With software 3.0, AI has become a new layer in the software stack. dltHub was ahead of the curve in embracing this shift. From the beginning, the team designed dlt to work seamlessly with code-generation tools and LLMs. They built custom plugins for AI coding assistants and even a CLI (dlt ai) that sets up your environment for vibe coding with one command. We see dltHub as the vanguard of a new generation of developer tools that integrate AI natively into the development lifecycle – not as an add-on, but as a core capability.
  • Transforming long-tail source ingestion. Developers are dealing with an ever-expanding long tail of data sources, from niche SaaS applications and internal APIs to one-off data dumps. Historically, loading these long-tail sources was a manual, ad-hoc process for data engineers. dltHub makes this process programmable and shareable. With dlt, even the most niche source can be formalized into a reusable pipeline module and contributed back for others to use.
  • Programmable, collaborative infrastructure. We’re drawn to software that untangles messy processes and makes them intuitive. dltHub is doing exactly that for data loading by transforming imperative scripts and throwaway code into clean, declarative pipeline definitions under version control. It’s also bridging the gap between data engineers and data consumers. Because dlt pipelines are written in Python, analysts and ML engineers (not just specialized ETL developers) can understand and create them. This code-first, collaborative ethos reminds us of what dbt did for data transformation and what GitHub did for code.  

Join the dltHub community!

Programming is becoming a conversation with our computers. Where we once debugged syntax errors and wrestled with API documentation, we now describe our intentions and watch AI systems translate them into working code in ways that feel truly magical. Software 3.0 means both a new way of building software and radical expansion of who gets to build it – all happening at a breakneck pace.

dltHub sits at the center of this accelerating trend. Having watched the team develop and scale their vision over the past year, I couldn’t be more excited to see dltHub go live in beta for the broader developer community.

For anyone who works with data, getting started is simple: install the dlt library, explore the docs, and open your favorite IDE or AI coding assistant to see how quickly you can vibe code a pipeline. Be sure to also join the community Slack – you’ll find dltHub team members and users there to help.

We’re proud to back Matthaus, Marcin, and the entire dltHub team in their mission to make data engineering more effortless, intelligent, and community-driven. The age of human-AI collaboration in software is here, and dltHub is helping make it real for every Python engineer.


Published on July 15, 2025
Written by Steve Vassallo

Related Stories