Blog/May 19, 2026/

Product

Introducing dltHub: Claude/Codex/Cursor-native data engineering

Generally available today. 91% of new dlt pipelines are now built by agents. dltHub makes building and running them production-grade for any Python developer.

Matthaus Krzykowski,
Co-Founder & CEO

On this page

Two trends behind dltHub: agents writing dlt pipelines, laptops running them
Built for agents
Why we were ready for this
Product principles
How it fits together
What’s in the GA release
What’s next
What it looks like in production
Beyond Pro: Scale and Enterprise
Start in one command
Pro plan: 119 USD/month
Migration support

dltHub is the Claude/Codex/Cursor-native platform that makes data engineering accessible to any Python developer, pairing agents that build dlt pipelines with the runtime that ships them to production.

Here’s what that looks like from today:

If you’re an engineer, from today you no longer need to wait for a data team to answer questions about your own product. In 15 minutes you can ingest Sentry errors and PostHog events into a pipeline and build the error-rate-per-feature dashboard yourself, so your team can decide whether to ship the next release tomorrow or pull it back.

If you’re an analyst, from today you no longer need to file a ticket and wait two weeks for engineering to wire up a new data source. In 15 minutes you can ingest customer data from Stripe and Attio into your warehouse with a pipeline you built yourself, so you can run the cohort analysis your CEO asked for this morning before the next leadership meeting.

If you’re fluent in AI plus a little Python, from today you no longer need to wait for engineering to ingest any data to drive the decisions you want. In 15 minutes you can ingest from the Luma API and build a sign-ups dashboard that shows your wider GTM team how the marketing for one of your events is doing, so they can double down on what’s working before the event closes.

Two trends behind dltHub: agents writing dlt pipelines, laptops running them

Two patterns are reshaping data ingestion.

In January 2025, our community produced about 2,400 dlt pipelines a month. 5% of them were written by agents. In January 2026, that number is 81,000 pipelines a month, 91% written by agents. Total community pipeline volume grew 34× year over year. dlt has become the data ingestion runtime.

Where these pipelines get built, the developer’s local laptop, grew with them. Unique DuckDB devices loading via dlt grew 15×, from 3,923 to 58,306 per month.

dltHub is what these two trends look like as a product: agents building on the laptop, runtime shipping to production in your cloud data warehouse.

And the pattern holds in production. A little more than a year ago we crossed 3,000 companies running dlt in production. Today, over 10,000 companies run dlt in production.

Built for agents

Tools move data. They don't move context.

Every data tool today is fluent in data and silent on context. Schema knowledge stays at ingest. Join structure stays in transform. Lineage stays in the orchestrator. Runtime state stays in the warehouse. By the time the agent shows up, the context it needs to reason has been thrown away three steps ago.

The platform that wins for agents is the one that continuously produces the highest-quality context - schemas, metadata, traces, runtime logs, semantic annotations - and lets agents compose against it from extract & load through transformations to deployment.

That's not what most platforms are trying to be. Most are trying one of two moves:

1. Stacks built as islands. Ingestion in one tool, transformation in another, orchestration in a third. Each tool sees its slice. The bridges between them are JSON artifacts and ticket queues. Context drops at every boundary.

2. Meaning bolted on at the end. Chatbots on the warehouse. BI add-ons that "talk to your data." Agent sidekicks that promise to reason. All of them trying to give the agent context after the pipeline already threw it away. Too late by construction.

Neither move ends in an agent-readable context layer. They end in a smarter UX on top of the same fragmented stack.

What makes a data engineering platform Claude/Codex/Cursor-native? An agent-readable context layer that every workflow reads from and writes to.

To make that concrete, here’s how dltHub is organized:

Building blocks, atomic Python primitives: dlt, DuckDB, Marimo, Ibis. Python-first, declarative, modular.
Modules, building blocks composed into agent-facing infrastructure, e.g. dltHub transformations.
Toolkits, what users actually see: REST API ingestion, exploration, transformation, deployment.
Context layer, the shared substrate all of the above read from and write to. This is the thing that makes the rest work.

Why we were ready for this

Five years ago, agents weren’t writing pipelines. We made dlt Python-first, declarative, modular, and context-aware because those were the right choices for a library humans could trust: code-first semantics, easy inspection, no monolith. It turns out those same properties are exactly what lets an agent reason about a pipeline end-to-end.

We’re not alone in noticing this. At PyData SF in March, Prefect’s Jeremiah Lowin called a similar architecture “PyStack”: atomic, composable, simple Python abstractions LLMs can reason about. Last week, a16z’s Seema Amble made a parallel argument from the market side: as software goes headless, defensibility moves out of the UI and into the data, context, and action layers underneath. The data is the context now. Two angles, one conclusion: when agents become the primary builder, what matters is what flows underneath.

Product principles

These codify how we build dltHub:

Transparent, declarative, context-aware. Everything is code; semantics flow as metadata.
Modular and composable. DuckDB, Marimo, Ibis, not a monolith.
Human-in-the-loop guardrails. Agents propose, humans validate; deterministic tooling enforces.

How it fits together

We started building the LLM-native version of dlt in 2024. Last summer we announced that we are building dltHub as the LLM-native data engineering platform that lets any Python developer build, run, and deliver end-user-ready reports from dlt pipelines, alongside an initial dltHub workflow covering 1,000+ REST API sources. Over the past 2.5 years we’ve assembled the building blocks into modules like dltHub transformations, and exposed them through toolkits. At GA: REST API ingestion, exploration, transformation, and deployment.

Toolkit workflows like annotating two sources are what users see when they interact with the agent. What makes them work is the context layer underneath. dltHub is built so the platform continuously produces that context, and so every workflow composes against it as agents move from extract & load to transformations to deployment.

What’s in the GA release

dltHub is the production runtime paired with the agent-facing toolkits that build, transform, and deploy on top of dlt’s open-source building blocks. It’s built for the smallest team that can run an end-to-end data stack: 1 coder and 2–5 stakeholders.

What ships at GA, in three layers:

Platform: AI Workbench, secrets management, local DuckDB workspace, OTEL telemetry.
Build agents: pipeline building (data engineer + Python engineer territory), exploration & debugging, troubleshooting across all three workflows, validation & transformation (data scientist + analyst + BI), and semantic modeling (data steward + modeler + Python engineer). Each agent collapses work a team would otherwise spread across multiple roles.
Run: managed runtime, deployment agent, observability, and schedules with transform-aware triggers. Humans take over at any point via the dltHub platform.

From a coding agent’s perspective, the workflow is three moves: extract & load (REST API, SQL Database, and Filesystem sources), transform (dltHub transformations, with ontology-based skills), and deploy (managed runtime, observability, and schedules with transform-aware triggers).

All of that ships today. Self-serve sign-up is live at app.dlthub.com with 30 USD in usage credits, US or EU data planes, small-team collaboration, and consumption tracking. For the first two weeks, Pro upgrades and top-up credits are handled manually. Drop us a line and we’ll get you set up.

What’s next

The next six weeks layer on more skills inside the toolkits.

Within two weeks: all of dltHub moves into one source-available repo, an observability UI overhaul, public sharing links, a brighter spotlight on the PII redaction toolkit, and self-serve checkout for Pro upgrades.
By July: transformations move from public preview to GA with workflow improvements and a library of pre-built ontologies; data quality improvements; better organization invite flows; weekly and monthly self-serve plans plus top-up credits.

We’ll keep shipping new toolkits and new skills at this cadence. Each release will make dltHub more valuable.

What it looks like in production

We migrated from a SaaS ETL platform to custom-owned dltHub pipelines with the help of AI, gsheets, Zendesk, HubSpot, Asana, Personio, REST APIs, S3 to Redshift. Five of our senior analysts now author and maintain those pipelines themselves. Data engineering is no longer a bottleneck for our analytics work.

Stefan Szegeny, Senior Data Engineer Hive

dltHub and Snowflake deliver a simple, end-to-end pathway for financial institutions to transform raw data into governed analytics and AI-ready datasets without needing a full engineering team. Whether you’re an engineer stepping in to answer business questions, an analyst building your own pipelines, or someone fluent in AI with basic Python skills, you can pull data from core banking systems, market feeds, and APIs directly into Snowflake. You deliver outcomes that once required specialised data engineering resources.

Suraj Rajan, Field CTO, Financial Services, Snowflake

I always had good intuition about what data I needed, but never the resources to get it and measure it reliably. Our data was spread across many systems and pulling them together was non-trivial manual effort. Even when we built some automation, we couldn’t bear the cost of reconciling it to existing reporting, let alone build, run, and maintain it. dltHub changes the equation for me. It’s the first product I’ve seen built for operators (and their agents), not just the fully-staffed enterprise data teams. I’m excited to see what this unlocks for my AI-enabled peers in the near future.

Jacob Matson, Developer Advocate, MotherDuck

dltHub lets my agents develop data pipelines locally, test changes quickly and cheaply in CI, and then runs them in the cloud against my largest workloads. It gives them the tools to take care of the knucklehead stuff so that I can get a good night’s sleep.

Josh Wills, Member of Technical Staff, Datalogy AI

I am building our internal AI-skills usage leaderboard. dltHub turns this into a cohesive process without messy scripts to dedupe queries or wrangle intermediate tables. Anyone on the team can use agents to easily contribute.

Nate Sesti, CTO/Co-FounderContinue

Beyond Pro: Scale and Enterprise

Pro is built for the smallest team that can run an end-to-end data stack. As organizations grow, both the context layer and the agent team grow with them. That’s what dltHub Scale and dltHub Enterprise are for.

dltHub Scale, for mid-size companies, will be in GA in August 2026 and will cost from $1k/month. Scale extends Pro with a richer context layer (AI-native data catalog, ontologies, lineage, LLM wikis, multi-team collaboration, and dltHub’s own storage layer for agentic loops, built on Lance) and a larger agent team: the same four build agents as Pro, plus operational agents for next-day/change validation, pager duty, daily pipeline health, and dataset health. The product shifts from agents that build your pipelines to agents that build and run them.

dltHub Enterprise will be in GA in early 2027. We’ve supported enterprise customers with custom arrangements for years and will continue to. The formal Enterprise product (dedicated support, custom SLAs, and tailored deployment options) ships next year. Get in touch if you want to talk now.

Start in one command

Install, try it, upgrade when you’re ready.

To get started with dltHub run this command:

uvx dlthub-start

Shell

uvx dlthub-start

This will give you two onboarding options:

Starter (recommended): all features, for the full experience.
Minimal: a single pipeline, for the quickest look at how dltHub works.

From there, uvx dlthub-start installs your local dltHub workspace dependencies and drops your project, with agent configs for Claude, Cursor, and Codex ready to go.

When you’re ready to deploy or share with teammates, sign up at app.dlthub.com. That’s where pipelines, notebooks, and team collaboration live.

Your account starts with 30 USD in usage credits (about 30 hours of pipeline runtime) to explore the platform for 14 days.

Pro plan: 119 USD/month

Includes 50 USD in monthly credits (about 50 hours of runtime on dltHub’s managed infrastructure). Beyond that, runtime is billed at 1 USD/hour.

A Pro subscription also unlocks the source-available features: dltHub agentic toolkits, MS SQL Change Tracking, Iceberg destination, and transformations features, all of which are source-available.

For the first two weeks after launch, upgrades go through us directly. So just reach out and we’ll get you on Pro.

See our pricing page for full details.

Migration support

If you’re moving existing pipelines to dltHub on a tight schedule or don’t have internal resources to move, you can also purchase migration services and forward-deployed engineering to accelerate the move. Get in touch.

To go deeper:

check out the dltHub docs, and
our Agentic Data Engineering course.

Claude-, Codex-, and Cursor-native data engineering, one command away.

Get started with dltHub

Agents build your dlt pipelines from a prompt. dltHub allows you and your agent to deploy them to production with scheduling, alerting, and observability - one command, zero manual setup.

Get started