Celebrating our 3,000th OSS dlt customer as dlt’s momentum accelerates
- Matthaus Krzykowski,
Co-Founder & CEO
Milestone announcement
We're excited to share that since the release of dlt
version 1.0, we've seen accelerated growth across the board.
In just the past six months, dlt
has grown:
- From 1,000 open-source customers in production to over 3,000
- From over 500,000 monthly PyPI downloads to more than 1.4 million

This milestone is a testament to the continuous support of our community and the trust you've placed in dlt. We're truly grateful to every one of you who has helped us reach this exciting point. So, where do we go from here?
Reflecting on the journey: what’s fueling dlt's product growth?
Since 2017, the number of Python users has been growing by millions each year. The Python developer community has expanded from 7 million in 2017 to 21.7 million in 2024.
We founded dltHub in 2021 to help developers in Fortune 500 companies get their AI agents into production by helping them build modern data pipelines. These developers were typically presented with two conventional options for data ingestion:
- SaaS tools that abstract away the entire data loading process
- Object-oriented frameworks aimed at seasoned software engineers
Neither of these solutions served the new wave of Python-first developers: often data scientists, analysts, and AI engineers. In fact, 95% of the developers we worked with ended up writing custom Python scripts for data ingestion. There was no “Jupyter Notebook, Pandas, NumPy, etc. equivalent for data loading” that matched their workflow and skillset, so we decided to build one.
Why dlt has grown on a product level
dlt
has grown for two key reasons:
1. dlt’s coverage of Pythonic workflows has steadily increased and now LLMs can understand and interact with them
When we launched the first version of dlt
in February 2023, it was a simple tool designed to help anyone comfortable with Python scripts load and process JSON
documents. The 1.0 release integrated key use cases Pythonic data engineers care about directly into the dlt
core library (code for database syncs, files, the REST API toolkit and a SQLAlchemy destination). Thanks to the continuous support of our community, dlt
has evolved into a comprehensive Python library for moving data - one that now supports 235 distinct data engineering workflows.
The existence of a library like dlt
is especially meaningful today, as Python has become the dominant language not just for analytics and data workflows, but also for AI. This means developers can now feed dlt
-based workflows directly to LLMs, and they can attempt to understand, assist, and even generate them.
Elise writing a dlt script with Claude
2. dlt
is modular, interoperable, and built on stable Pythonic product principles
From day one, dlt
was designed to be interoperable and modular - qualities that align with how AI and data workloads are evolving in the Python ecosystem. As industry leaders such as Databricks’ Matei Zaharia (”Compound AI Systems”) and Wes McKinney (”Pythonic Composable Data Systems”) have pointed out, modularity is critical in the emerging data stack.dlt
also plays well with high-performance Python data libraries like PyArrow, Ibis, and DuckDB, and efficiently processes industrial-scale data at some of the largest companies in the world.
What we’ve observed in conversations with customers is that the growing demand to integrate Python/ML/AI workloads into company data infrastructure is putting pressure on that infrastructure to evolve - to become more modular and interoperable itself. dlt
enables this shift. It integrates seamlessly with the Modern Data Stack, meaning teams using Airflow, Dagster, Databricks, dbt, or Snowflake can enhance their in-house platforms simply by adding dlt
.
dlt is also growing because we learnt to respect the Pythonic developer principles our community demands from us so they can trust our code in production. They shine through in how we talk about dlt. These principles guide how we build, document, and communicate about dlt
, and they help developers trust our library for production use. Anyone who joins the dlt
team is trained in these principles, because we believe developer trust is earned through consistency, transparency, and a shared philosophy.
Further commentary on our 2025 OSS dlt roadmap
Alongside this announcement, we're also releasing our 2025 open-source roadmap to give the community a clearer view of where we’re headed this year. As always, we welcome your feedback and thoughts.
If you're looking for a more technical perspective, check out Marcin’s post — in his view, dlt
has already gone through two major phases, and we're now entering Phase three. In this post, I want to zoom in on some of the motivations behind the two themes that will guide us: dlt’s expanded coverage of data engineering workflows and our initial plans on how to make dlt simpler with the help of LLMs.
1. dlt's coverage of Pythonic workflows will continue to expand
Marcin highlights two industry trends that are especially important to us we will expand dlt’s workflow coverage:
- The shift from cloud warehousing to data lake architectures We’re seeing a growing move toward more modular, Pythonic data infrastructure - especially with the rise of formats like Apache Iceberg and the increasing role of data catalogs. As we've noted before,
dlt
is interoperable and modular by design, and we’re seeing that these qualities are becoming increasingly valuable as ML and AI workloads pressure traditional data infrastructure to evolve. It's no surprise, then, that more and more community members are usingdlt
to power pipelines that write directly to filesystems. We’re hearing a growing demand for native support of Pythonic, Iceberg-centric data lake architectures, and we're listening. - The continued maturity of high-performance Python data libraries Libraries like
duckdb
,ibis
, andarrow
are becoming the foundation of modern data processing in Python. As part of the broader trend toward modular compute and query engines, we’ve seen our users begin to move away from SQL-centric transformation jobs in favor of these more expressive and Python-native tools. We're committed to continuing our investment in this direction, enabling even more powerful workflows that take full advantage of Python’s evolving data ecosystem.
2. dlt will become simpler to use with the help of LLMs
While we're proud of how powerful dlt
has become as we continue to expand its capabilities, we also recognize that this growth has made the library more complex for new users to adopt.
This reflects a broader truth: data engineering requires a high level of specialized knowledge. That’s why we’ve invested in teaching these concepts in our Pythonic Data Engineering courses, from how dlt
handles incremental loads and backfilling, to techniques for optimizing pipeline performance.
The first step : vibe coding dlt
sources is about to be even simpler
A year ago, we highlighted that many users choose dlt
because it’s the fastest way to create a dataset. Spinning up a pipeline and source with dlt
is dramatically simpler than with most other tools - it’s “pip install dlt
and go.” Running a custom dlt
source in production was also relatively easy, thanks to the built-in handling of many common pipeline maintenance issues.
Since day one, we believed that dlt
must be usable not just by developers, but by code generation tools as well. We’ve always built dlt
not only for humans, but also for LLMs.
Over the past six months, we’ve seen our community embrace AI-powered code editors. As a result, contributions of custom dlt
sources have increased significantly. Users are building and shipping pipelines faster than ever - and our monthly custom source creation continues to rise.

We’ve seen early adopters like Martin Salo share the Cursor rules they’ve developed for working with dlt
. Inspired by the community, we started recreating some of these workflows ourselves. You can see Adrian vibe code various connectors, including Airbyte, Singer, and Java wrappers, into dlt
pipelines on on our YouTube channel.
We believe the era of static “connector catalogs” is behind us.
To support this shift, we’ve released an initial set of AI assistants and building blocks for custom assistants on the Continue Hub.
Today, we’re also launching two initial MCP servers that developers can run locally and integrate with their preferred AI code editors — including Continue, Claude Desktop, Cursor, and Cline.
Additionally, we’re releasing native support for LLM rules in dlt
. Running the CLI command dlt ai setup cursor
installs our initial rule set out-of-the-box. You can check out early usage tips here. We expect custom source creation to accelerate even further.
The next step: towards dltHub
as a home for pipeline rules and code snippets
It’s already relatively easy to ask an LLM to generate a dlt
pipeline for a well-documented source like Stripe, as shown in some recent Deepseek examples. But we believe this is just the early days of AI workflows in data engineering.
As Marcin has pointed out, there’s still a long way to go. Adrian recently shared early thoughts on a benchmark for AI-generated pipelines a a direction we’re excited to explore further.
Still, this early wave of LLM tooling is bringing us closer to launching the dltHub
we’ve always envisioned: a place where hundreds of thousands of pipelines can be created, shared, and deployed. A space where datasets, reports, and analytics can be shared publicly or privately, just like models and datasets on Huggingface and apps on Streamlit.
We believe a similar experience should exist for data engineering. That’s why we’ve started working on the first version of dltHub
. In its early form, it will be a place to find and share pipeline rules and code snippets - easily accessible by both humans and LLMs.