Blog/May 12, 2026/

Product,
Tutorials,
Engineering

From prompt to production: a free course on agentic data engineering

With 91% of dlt pipelines AI-written, learn Agentic Data Engineering in this free 1-hour course.

Adrian Brudaru,
Co-Founder & CDO

On this page

What "it runs" doesn't cover
The workflow runs on metadata
The session, end to end
Where the lessons fit
Who it's for
What does it do for you?
How to start

In January 2025, the dlt community shipped 2,400 custom pipelines. In January 2026, 81,000. About 91% were written by agents: Claude, Cursor, Copilot, the tools every Python engineer already has open in a tab.

Generation stopped being the bottleneck around November 2025, when every frontier model got a major update. They can all write a dlt pipeline now. How well they do it, how cleanly, whether credentials leak on the way, whether the next engineer can read the result depends on the instructions the model has. The dltHub AI Workbench is the instruction set.

Agentic Data Engineering with dltHub course is live today: five lessons, about an hour total. Generate an ingestion pipeline correctly the first time. Explore the data. Confirm or fix the schema. Deploy. Transform. Finish with a result the stakeholder who asked for the data can actually use.

Enroll at dlthub.learnworlds.com/course/agentic-data-engineering.

What "it runs" doesn't cover

A generated pipeline that ran once is not a pipeline that runs. Production means someone sees it when it breaks and can fix it without re-reading the prompt. Four questions don't go away when the agent writes the first draft:

Is the data correct?
Is the schema stable when the upstream API changes?
Can this run in production, with credentials handled, dependencies pinned, environment validated?
Can anyone other than the original prompter understand what the agent built?

These are workflow problems, not code problems. You can prompt your way to a working extract in five minutes. What happens during and what comes after has always been the senior bar.

We've measured the difference. In our eval runs, a base agent skips documentation, runs full loads before sampling, and reads secrets.toml directly. The same agent with the AI Workbench checks docs, samples first, and never touches the secrets file. Same model, same prompt, different workflow, different behavior.

The workflow runs on metadata

Most AI-assisted data engineering today loses context at every step. One tool generates the pipeline. Another validates it. A third builds the dashboard. The agents don't share state. The schema the first tool produced isn't visible to the second; the contract the second wrote isn't readable by the third. Every step starts from scratch.

The AI Workbench changes that. Every step produces structured information the next step reads: schemas, contracts, runtime traces, load IDs. The exploration toolkit reads what the REST API toolkit emitted. The transformation toolkit reads what exploration confirmed. The agent doesn't re-examine what you have at every boundary, because the context is metadata, and metadata flows.

That's how a twelve-step lifecycle becomes one continuous session, and what lets agents handle the whole workflow instead of just the code.

The session, end to end

How a senior engineer builds with agents. Twelve steps, four phases.

Phase 1 - Define

Step 1. Define the outcome. Which decision will this data change? Which metric or chart? Who reads it? Which sources do you need? You start the AI Workbench session deliberating with Claude, before any pipeline code gets generated. Skip this and the agent will happily build the wrong pipeline. Nothing later in the workflow has anything to verify against. A pipeline is a promise about freshness, shape, and correctness. The part the agent can't infer is the promise to whom. That's where data gets meaning, or doesn't.

Phase 2 - Build the pipeline

Steps 2-6. Develop the ingestion pipeline locally on DuckDB. Validate with a sample load. Explore the data in a Marimo notebook. Deploy to production: the agent strips dev artifacts, pins dependencies, validates credentials without reading them. Verify incremental loading in prod.

Phase 3 - Model the data

Steps 7-9. Develop the transformation pipeline. The agent builds an ontology of what your data means before writing the model. Validate on the transformed tables, deploy with one command.

Phase 4 - Ship and monitor

Steps 10-12. Build the notebook the stakeholder opens. Deploy it. Monitor pipeline health and data quality over time.

Same code from local prototype to production. Same governance from raw data to dashboard. Same agents writing every step, under workbench rules they can't skip. The shape inside each step is propose-verify-enforce: the agent proposes, the workflow verifies, the rules enforce what doesn't get past review. The next agent, or the next person, can't quietly undo the check.

Where the lessons fit

Five free lessons cover the workflow end to end.

00 - Introduction. Foundation toolkit. dlt OSS. Get the AI Workbench installed in your editor.
01 - REST API pipeline toolkit. Steps 2-3. dlt OSS. From a single prompt to a production-grade pipeline: endpoint discovery, auth, pagination, schema contracts, incremental loading. Backed by thousands of pre-built REST API contexts so the agent surfaces ambiguity instead of guessing. Validate the ingested data before it goes further.

02 - dltHub deployment toolkit. Steps 5-6, 9. dltHub Pro, free 30-day trial during the course, no invite needed. Convert your dev workspace into a production profile. Strip development artifacts, pin dependencies, validate credentials, with the agent never touching sensitive data.
03 - Data exploration toolkit. Steps 4 and 8. dlt OSS. The agent generates validation reports and Marimo dashboards in-session. Feedback loop in minutes, not days.
04 - Data transformation toolkit. Step 7. Requires dltHub runtime access; you can start a free trial during the course. Ontology first, then code. The agent maps your sources to canonical entities, builds the entity graph, generates a Kimball CDM, writes the @dlt.hub.transformation script that populates it.

Each toolkit is a guided sequence of skills, commands, rules, and MCP, with guardrails the agent can't skip. Maintained by dltHub, controlling the infrastructure the pipelines run on. You're not learning a tutorial version. You're learning the version that ships.

For the conceptual underpinnings, the AI Workbench blog series covers the design choices behind the skills.

Who it's for

If your build data pipelines, this is for you. Data engineers and analytics engineers who use Claude or Cursor every day and want the pipeline to outlast the prompt. Platform engineers building self-service for the rest of the company. Senior and staff ICs at startups through mid-market who own the surface when something drifts.

This is not for hobbyists - this is for people who run pipelines in production. You should be comfortable enough to debug a stack trace and read some code.

What does it do for you?

The workflow is general-purpose. What you do with it depends on the seat you sit in and the shape of the job.

Green field build - Building a data stack for the first time? this workflow takes you from ingestion to delivery through a best practice architecture that's ideal for use with agents, providing LLMs with "senior guardrails" to help you create results quickly without tech debt.
Whole-stack rebuild. You inherited a warehouse built fast. No clean architecture, tables piled up, business meaning in tribal knowledge, the semantic model wrong or missing. Point the workbench at it. The toolkit reverse-engineers the existing SQL into a draft ontology, generates a canonical model, produces a clean T-layer. Same source data, AI-ready output. No three-engineer hiring cycle.
One component. Each toolkit is independent. Use just ingestion. Just exploration. Just transformations on data you already have. The metadata flow makes the pieces compose. You don't have to commit to the whole stack to get something out of any one of them.
Data engineer or analytics engineer. Pipeline in an afternoon instead of a sprint. Pull a new source, model it, ship the dashboard same day. "I have a question" to "I have a chart" collapses to one session.
GTM engineer. Lead enrichment, attribution stitching, churn signals, lifecycle scoring. All of this is complex, takes a lot of code, and you want to focus on outcomes, not plumbing. Build the workflow without queuing on the data team. The agent handles the code, you own the logic.
Team lead or platform engineer. The workbench is a senior architect's judgment encoded in software. Guardrails the agent can't skip. Conventions the next hire picks up by reading the project. The stack stops drifting; entropy stays low.

You don't have to plan the whole pipeline upfront. Just dive in and ask for what you need, and let the agent figure it out. Because metadata flows through every step, if transformation or visualization needs an endpoint or field that wasn't ingested, the agent goes back and pulls it. Most of the course teaches you to understand the workflow, plan the outcome, and validate the result. The agent does the execution.

How to start

The course is hosted here and guides you through everything you need to get to your desired outcome.

Free, self-paced, sign up here! Estimated time 1 hour.

dlthub.learnworlds.com/course/agentic-data-engineering.

Let the agent write the code, while you learn the principles of how to steer it.

Blog/May 12, 2026/

Product,
Tutorials,
Engineering

From prompt to production: a free course on agentic data engineering

With 91% of dlt pipelines AI-written, learn Agentic Data Engineering in this free 1-hour course.

Adrian Brudaru,
Co-Founder & CDO

On this page

What "it runs" doesn't cover
The workflow runs on metadata
The session, end to end
Where the lessons fit
Who it's for
What does it do for you?
How to start

Enroll at dlthub.learnworlds.com/course/agentic-data-engineering.

What "it runs" doesn't cover

Is the data correct?
Is the schema stable when the upstream API changes?
Can this run in production, with credentials handled, dependencies pinned, environment validated?
Can anyone other than the original prompter understand what the agent built?

These are workflow problems, not code problems. You can prompt your way to a working extract in five minutes. What happens during and what comes after has always been the senior bar.

The workflow runs on metadata

That's how a twelve-step lifecycle becomes one continuous session, and what lets agents handle the whole workflow instead of just the code.

The session, end to end

How a senior engineer builds with agents. Twelve steps, four phases.

Phase 1 - Define

Phase 2 - Build the pipeline

Phase 3 - Model the data

Steps 7-9. Develop the transformation pipeline. The agent builds an ontology of what your data means before writing the model. Validate on the transformed tables, deploy with one command.

Phase 4 - Ship and monitor

Steps 10-12. Build the notebook the stakeholder opens. Deploy it. Monitor pipeline health and data quality over time.

Where the lessons fit

Five free lessons cover the workflow end to end.

00 - Introduction. Foundation toolkit. dlt OSS. Get the AI Workbench installed in your editor.
01 - REST API pipeline toolkit. Steps 2-3. dlt OSS. From a single prompt to a production-grade pipeline: endpoint discovery, auth, pagination, schema contracts, incremental loading. Backed by thousands of pre-built REST API contexts so the agent surfaces ambiguity instead of guessing. Validate the ingested data before it goes further.

02 - dltHub deployment toolkit. Steps 5-6, 9. dltHub Pro, free 30-day trial during the course, no invite needed. Convert your dev workspace into a production profile. Strip development artifacts, pin dependencies, validate credentials, with the agent never touching sensitive data.
03 - Data exploration toolkit. Steps 4 and 8. dlt OSS. The agent generates validation reports and Marimo dashboards in-session. Feedback loop in minutes, not days.
04 - Data transformation toolkit. Step 7. Requires dltHub runtime access; you can start a free trial during the course. Ontology first, then code. The agent maps your sources to canonical entities, builds the entity graph, generates a Kimball CDM, writes the @dlt.hub.transformation script that populates it.

For the conceptual underpinnings, the AI Workbench blog series covers the design choices behind the skills.

Who it's for

This is not for hobbyists - this is for people who run pipelines in production. You should be comfortable enough to debug a stack trace and read some code.

What does it do for you?

The workflow is general-purpose. What you do with it depends on the seat you sit in and the shape of the job.

Green field build - Building a data stack for the first time? this workflow takes you from ingestion to delivery through a best practice architecture that's ideal for use with agents, providing LLMs with "senior guardrails" to help you create results quickly without tech debt.
Whole-stack rebuild. You inherited a warehouse built fast. No clean architecture, tables piled up, business meaning in tribal knowledge, the semantic model wrong or missing. Point the workbench at it. The toolkit reverse-engineers the existing SQL into a draft ontology, generates a canonical model, produces a clean T-layer. Same source data, AI-ready output. No three-engineer hiring cycle.
One component. Each toolkit is independent. Use just ingestion. Just exploration. Just transformations on data you already have. The metadata flow makes the pieces compose. You don't have to commit to the whole stack to get something out of any one of them.
Data engineer or analytics engineer. Pipeline in an afternoon instead of a sprint. Pull a new source, model it, ship the dashboard same day. "I have a question" to "I have a chart" collapses to one session.
GTM engineer. Lead enrichment, attribution stitching, churn signals, lifecycle scoring. All of this is complex, takes a lot of code, and you want to focus on outcomes, not plumbing. Build the workflow without queuing on the data team. The agent handles the code, you own the logic.
Team lead or platform engineer. The workbench is a senior architect's judgment encoded in software. Guardrails the agent can't skip. Conventions the next hire picks up by reading the project. The stack stops drifting; entropy stays low.

How to start

The course is hosted here and guides you through everything you need to get to your desired outcome.

Free, self-paced, sign up here! Estimated time 1 hour.

dlthub.learnworlds.com/course/agentic-data-engineering.

Let the agent write the code, while you learn the principles of how to steer it.

What "it runs" doesn't coverLink icon

The workflow runs on metadataLink icon

The session, end to endLink icon

Where the lessons fitLink icon

Who it's forLink icon

What does it do for you?Link icon

How to startLink icon

What "it runs" doesn't coverLink icon

The workflow runs on metadataLink icon

The session, end to endLink icon

Where the lessons fitLink icon

Who it's forLink icon

What does it do for you?Link icon

How to startLink icon

What "it runs" doesn't cover

The workflow runs on metadata

The session, end to end

Where the lessons fit

Who it's for

What does it do for you?

How to start

What "it runs" doesn't cover

The workflow runs on metadata

The session, end to end

Where the lessons fit

Who it's for

What does it do for you?

How to start