From prompt to production: a free course on agentic data engineering
Adrian Brudaru,
Co-Founder & CDO
In January 2025, the dlt community shipped 2,400 custom pipelines. In January 2026, 81,000. About 91% were written by agents: Claude, Cursor, Copilot, the tools every Python engineer already has open in a tab.
Generation stopped being the bottleneck around November 2025, when every frontier model got a major update. They can all write a dlt pipeline now. How well they do it, how cleanly, whether credentials leak on the way, whether the next engineer can read the result depends on the instructions the model has. The dltHub AI Workbench is the instruction set.
Agentic Data Engineering with dltHub is live today: five lessons, about an hour, free. It starts where a senior engineer starts. Generate an ingestion pipeline correctly the first time. Explore the data. Confirm or fix the schema. Deploy. Transform. End at a result the person who asked for the data can actually use.
Enroll at dlthub.learnworlds.com/course/agentic-data-engineering.
What "it runs" doesn't cover
A generated pipeline that ran once is not a pipeline that runs. Production means someone sees it when it breaks and can fix it without re-reading the prompt. Four questions don't go away when the agent writes the first draft:
- Is the data correct?
- Is the schema stable when the upstream API changes?
- Can this run in production, with credentials handled, dependencies pinned, environment validated?
- Can anyone other than the original prompter understand what the agent built?
These are workflow problems, not code problems. You can prompt your way to a working extract in five minutes. What comes after has always been the senior bar.
We've measured the difference. In our eval runs, a base agent skips documentation, runs full loads before sampling, and reads secrets.toml directly. The same agent with the AI Workbench checks docs, samples first, and never touches the secrets file. Same model, same prompt, different workflow, different behavior. The launch post walks through the credential check specifically: the agent confirms a key works without ever reading it.
The workflow runs on metadata
Most AI-assisted data engineering today loses context at every step. One tool generates the pipeline. Another validates it. A third builds the dashboard. The agents don't share state. The schema the first tool produced isn't visible to the second; the contract the second wrote isn't readable by the third. Every step starts from scratch.

The AI Workbench changes that. Every step produces structured information the next step reads: schemas, contracts, runtime traces, load IDs. The exploration toolkit reads what the REST API toolkit emitted. The transformation toolkit reads what exploration confirmed. The agent doesn't re-examine what you have at every boundary, because the context is metadata, and metadata flows.
That's how a twelve-step lifecycle becomes one continuous session, and what lets agents handle the whole workflow instead of just the code.
The workflow, end to end
How a senior engineer builds with agents. Twelve steps, four phases.
Phase 1 - Define
Step 1. Define the outcome. Which decision will this data change? Which metric or chart? Who reads it? Which sources do you need? You start the AI Workbench session deliberating with Claude, before any pipeline code gets generated. Skip this and the agent will happily build the wrong pipeline. Nothing later in the workflow has anything to verify against. A pipeline is a promise about freshness, shape, and correctness. The part the agent can't infer is the promise to whom. That's where data gets meaning, or doesn't.
Phase 2 - Build the pipeline
Steps 2-6. Develop the ingestion pipeline locally on DuckDB. Validate with a sample load. Explore the data in a Marimo notebook. Deploy to production: the agent strips dev artifacts, pins dependencies, validates credentials without reading them. Verify incremental loading in prod.

Phase 3 - Model the data
Steps 7-9. Develop the transformation pipeline. The agent builds an ontology of what your data means before writing the model. Validate on the transformed tables, deploy with one command.
Phase 4 - Ship and monitor
Steps 10-12. Build the notebook the stakeholder opens. Deploy it. Monitor pipeline health and data quality over time.
Same code from local prototype to production. Same governance from raw data to dashboard. Same agents writing every step, under workbench rules they can't skip. The shape inside each step is propose-verify-enforce: the agent proposes, the workflow verifies, the rules enforce what doesn't get past review. The next agent, or the next person, can't quietly undo the check.
Where the lessons fit
Five free lessons cover the workflow end to end.
- 00 - Introduction. Foundation toolkit. dlt OSS. Get the AI Workbench installed in your editor.
- 01 - REST API pipeline toolkit. Steps 2-3. dlt OSS. From a single prompt to a production-grade pipeline: endpoint discovery, auth, pagination, schema contracts, incremental loading. Backed by thousands of pre-built REST API contexts so the agent surfaces ambiguity instead of guessing. Validate the ingested data before it goes further.

- 02 - dltHub deployment toolkit. Steps 5-6, 9. dltHub Pro, free 30-day trial during the course, no invite needed. Convert your dev workspace into a production profile. Strip development artifacts, pin dependencies, validate credentials, with the agent never touching sensitive data.
- 03 - Data exploration toolkit. Steps 4 and 8. dlt OSS. The agent generates validation reports and Marimo dashboards in-session. Feedback loop in minutes, not days.
- 04 - Data transformation toolkit. Step 7. Requires dltHub runtime access; you can start a free trial during the course. Ontology first, then code. The agent maps your sources to canonical entities, builds the entity graph, generates a Kimball CDM, writes the
@dlt.hub.transformationscript that populates it.
Each toolkit is a guided sequence of skills, commands, rules, and MCP, with guardrails the agent can't skip. Maintained by dltHub, controlling the infrastructure the pipelines run on. You're not learning a tutorial version. You're learning the version that ships.
For the conceptual underpinnings, the AI Workbench blog series covers the design choices behind the skills.
Who it's for
If your name is on a pipeline, this is for you. Data engineers and analytics engineers who use Claude or Cursor every day and want the pipeline to outlast the prompt. Platform engineers building self-service for the rest of the company. Senior and staff ICs at startups through mid-market who own the surface when something drifts.
Not for hobbyists vibe-coding a one-off. Not a beginner Python course. You should be comfortable enough to debug a stack trace and read someone else's library code.
What does it do for you?
The workflow is general-purpose. What you do with it depends on the seat you sit in and the shape of the job.
- Whole-stack rebuild. You inherited a warehouse built fast. Tables piled up, business meaning in tribal knowledge, the semantic model wrong in places against how the company actually runs. Point the workbench at it. The toolkit reverse-engineers the existing SQL into a draft ontology, generates a canonical model, produces a clean T-layer. Same source data, AI-ready output. No three-engineer hiring cycle.
- One component. Each toolkit is independent. Use just ingestion. Just exploration. Just transformations on data you already have. The metadata flow makes the pieces compose. You don't have to commit to the whole stack to get something out of any one of them.
- Data engineer or analytics engineer. Pipeline in an afternoon instead of a sprint. Pull a new source, model it, ship the dashboard same day. "I have a question" to "I have a chart" collapses to one session.
- GTM engineer. Lead enrichment, attribution stitching, churn signals, lifecycle scoring. Build the workflow without queuing on a data team. The agent handles the plumbing, you own the logic.
- Team lead or platform engineer. The workbench is a senior architect's judgment encoded in software. Guardrails the agent can't skip. Conventions the next hire picks up by reading the project. The stack stops drifting; entropy stays low.
What you walk away with: an outcome, not a checklist. State what you want at the end of the workflow. Because metadata flows through every step, if transformation or visualization needs a field that wasn't ingested, the agent goes back and pulls it. You don't have to plan the whole pipeline upfront. Most of the course teaches you to understand the workflow, plan the outcome, and validate the result. The agent does the execution.
How to start
The course is hosted here and guides you through everything you need to get to your desired outcome.
Free, self-paced, sign up here! Estimated time 1 hour.
The agent writes the code, you learn the principles of how to steer it.