Blog/May 11, 2026/

Tutorials,
Community

From “don’t let the agent near prod” to safe agentic data workflows with dlt and Bauplan

AI agents can write data pipelines. The part that isn't ready is everything around them — isolation, rollbacks, safe promotion to prod. This demo shows what a stack built for agents actually looks like.

Elvis Kahoro,
DevX & Ecosystem Lead

On this page

Setting up dlt for agents: dlt ai init
Building a pipeline with Claude Code
REST APIs: imperative and declarative
Running it
Validating with Marimo
From 2 weeks to 20 minutes
What happens next: Bauplan
Try it yourself

AI agents are already good at writing data pipelines. The part that isn’t ready is everything around them. Most data stacks were built for careful humans, not for agents that run 10x the queries and occasionally try to drop a table.

Isolation, auditability, and rollbacks should be primitives, yet they remain afterthoughts. And even once the code is written, getting it deployed is a separate hurdle — mismatched environments, broken dependencies, credentials that work locally but not in the cloud.

In a recent live demo, Elvis Kahoro (dlt) and Ciro Greco (Bauplan) walked through what a data stack built specifically for this era looks like — and ran the whole thing end-to-end with Claude Code.

**dlt is the leading open-source Python library for building data pipelines using code and agents.** dltHub is the agentic platform that deploys, monitors, and scales them. One command. No manual environment setup and no silent failures.

Bauplan is a Python-native lakehouse built around data branches. Think of it as Git for your data where agents can explore freely, run parallel hypotheses, and iterate without anyone losing sleep over what they might break.

Setting up dlt for agents: `dlt ai init`

The demo started with something new in dlt: a set of AI-oriented CLI commands designed specifically for working with agents.

Text

dlt ai init --agent=claude

This initialises a set of skills, an MCP server, and rules for how Claude should work with dlt — all dropped directly into your .claude/ directory. Once set up, Claude knows how dlt works, what its primitives are, and how to handle common tasks without you explaining it every time.

After init, you can list and install toolkits, each being a collection of skills for specific workflows:

Text

dlt ai toolkit list

Text

Available toolkits:
  data-exploration    Quick insights from dlt pipeline data. Connect to a pipeline, profile tables, plan charts, and assemble marimo dashboards.
  dlt-runtime         Deploy dlt workspace and pipelines to the dltHub platform
  rest-api-pipeline   Build REST API pipelines with dlt: scope, debug and validate data
  transformations     Transform raw dlt pipeline data into a Canonical Data Model.

Each toolkit installs a set of targeted skills. The rest-api-pipeline toolkit, for example, installs 8 skills: find-source, create-rest-api-pipeline, debug-pipeline, adjust-endpoint, validate-data, and more. These give the agent a structured, fill-in-the-blank approach to building pipelines rather than letting it improvise from scratch.

Building a pipeline with Claude Code

With the toolkit installed, Elvis gave Claude a single prompt:

“Help me create a pipeline that pulls data from GitHub and loads it into DuckDB. I want to fetch the stars I have. My username is elviskahoro.”

Claude used the find-source skill to search dlt’s context server — a database of around 10,000 REST API source files — and came back with the right endpoint, pagination strategy, and auth recommendation for the GitHub Stars API. It then scaffolded a working pipeline using create-rest-api-pipeline.

This is what the smallest unit in dlt looks like:

Text

import dlt

@dlt.resource
def repos():
    yield [
        {"id": 1, "name": "dlt-hub/dlt", "language": "Python"},
        {"id": 2, "name": "duckdb/duckdb", "language": "Go"},
    ]

dlt.pipeline(
    "github_demo",
    destination="duckdb",
).run(source=repos())

One decorator, one function, a typed table in DuckDB. dlt infers the schema, types the columns, and writes the table. No configuration files, no migrations.

The decorator also exposes every knob the loader might need — write disposition, primary keys, PII anonymization, parallelization — so the agent has a structured interface to fill in rather than inventing its own patterns:

Text

@dlt.resource(
    primary_key="user_id",
    write_disposition="merge",
    columns={"email": {"anonymize": True}},
    parallelized=True,
)
def stargazers():
    yield from fetch_stargazers()

Multiple resources can be grouped into a source — useful when you’re pulling several tables from the same API:

Text

@dlt.source
def github_stars():
    return stargazers(), repos()

dlt.pipeline("github_stars", destination="duckdb").run(github_stars())

REST APIs: imperative and declarative

dlt supports two styles for REST API pipelines. The imperative style uses RESTClient directly — full control, good for complex auth or custom logic. The declarative style describes the API as a typed config dict — no loops, no client objects, easy for an agent to generate and version-control:

Text

github = rest_api_resources({
    "client": {
        "base_url": "https://api.github.com",
    },
    "resources": [
        "/repos/dlt-hub/dlt/issues",
    ],
    "resource_defaults": {
        "endpoint": {
            "params": {"per_page": 100, "state": "all"}
        }
    }
})

Running it

The pipeline ran and loaded a first batch of 100 starred repos into a local DuckDB file — two tables: starred (100 rows) and starred_topics (557 rows, a child table automatically created from the nested topics array). Elvis then removed the limit and ran a full load.

Full load result: 1,807 starred repos, 11,821 topic rows, 0 failed jobs — fetched via Link-header pagination that the agent wired in automatically.

Here’s what dlt handled without any custom code:

Schema inference — Python dicts and REST API responses became typed tables automatically
Normalization — nested arrays (like topics) became a separate child table with proper foreign keys
Incremental loading — one decorator argument away
Secrets — dlt.secrets.value and dlt.config.value handle credentials cleanly across local and cloud environments
Dataset API — once loaded, data is reachable as pandas, Arrow, or ibis — same API across DuckDB, Snowflake, BigQuery, Iceberg, and Filesystem
Portability — switching the destination is one line of code

As Elvis put it: “No custom glue — no migrations — no ‘build a platform.’”

Validating with Marimo

Once the pipeline ran, Elvis attached a Marimo notebook directly to it:

Text

pipeline = dlt.attach(
    pipeline_name="github_stars_etl",
    destination="filesystem",
    dataset_name="github_stars",
)

pipeline.dataset().repos_with_stars.select("repo_name", "star_count").arrow()

The result was a live, interactive table of all 548 repositories Elvis had starred, queryable and explorable locally before promoting anything to a production destination.

This is the local development loop dlt is designed for: build and validate against a local DuckDB, inspect with Marimo, then switch to the production destination — Bauplan, Snowflake, BigQuery — with one line change. No re-engineering the pipeline. No re-writing credentials. The dlt runtime handles the switch.

From 2 weeks to 20 minutes

One of dlt’s consulting partners, Tasman Analytics, used to spend two weeks scoping a project before seeing a single row of data. With dlt and Claude Code, they can pull up a semi-working prototype in a discovery call. They’ve since moved to fixed pricing because the business risk of prototyping is gone.

The new bottleneck, as Elvis noted, isn’t writing the pipeline anymore — it’s validating what the agent loaded and deciding when it’s ready to promote to production. Which is exactly where Bauplan picks up.

Want to build this yourself? The Agentic Data Engineering with dltHub course takes you from a single prompt to a production-grade pipeline you can trust.

What happens next: Bauplan

Once the GitHub stars data landed in S3 as Parquet, Ciro took over and showed what safe agentic iteration looks like on the other side of the pipeline.

Claude was given a single prompt: analyse which organisations are rising in terms of GitHub stars in March 2025, come up with two different hypotheses for what “rising” means, build pipelines for both, and recommend which to deploy.

Bauplan created an ingestion branch, imported the data safely, then spun up two parallel branches — one per hypothesis — running both analyses without ever touching main. After comparing the results side by side, Claude recommended one, and a single merge put it in production.

The whole workflow — safe ingestion, parallel hypothesis testing, comparison, merge — happened in isolated branches that agents could write to freely, with a human in the loop only at the merge step.

Try it yourself

dlt is open source and free to get started:

Text

pip install dlt
dlt ai init --agent=claude
dlt ai toolkit rest-api-pipeline install

Want to bring this into your stack? Talk to the team.

Blog/May 11, 2026/

Tutorials,
Community

From “don’t let the agent near prod” to safe agentic data workflows with dlt and Bauplan

Elvis Kahoro,
DevX & Ecosystem Lead

On this page

Setting up dlt for agents: dlt ai init
Building a pipeline with Claude Code
REST APIs: imperative and declarative
Running it
Validating with Marimo
From 2 weeks to 20 minutes
What happens next: Bauplan
Try it yourself

In a recent live demo, Elvis Kahoro (dlt) and Ciro Greco (Bauplan) walked through what a data stack built specifically for this era looks like — and ran the whole thing end-to-end with Claude Code.

Setting up dlt for agents: `dlt ai init`

The demo started with something new in dlt: a set of AI-oriented CLI commands designed specifically for working with agents.

Text

dlt ai init --agent=claude

After init, you can list and install toolkits, each being a collection of skills for specific workflows:

Text

dlt ai toolkit list

Text

Available toolkits:
  data-exploration    Quick insights from dlt pipeline data. Connect to a pipeline, profile tables, plan charts, and assemble marimo dashboards.
  dlt-runtime         Deploy dlt workspace and pipelines to the dltHub platform
  rest-api-pipeline   Build REST API pipelines with dlt: scope, debug and validate data
  transformations     Transform raw dlt pipeline data into a Canonical Data Model.

Building a pipeline with Claude Code

With the toolkit installed, Elvis gave Claude a single prompt:

“Help me create a pipeline that pulls data from GitHub and loads it into DuckDB. I want to fetch the stars I have. My username is elviskahoro.”

This is what the smallest unit in dlt looks like:

Text

import dlt

@dlt.resource
def repos():
    yield [
        {"id": 1, "name": "dlt-hub/dlt", "language": "Python"},
        {"id": 2, "name": "duckdb/duckdb", "language": "Go"},
    ]

dlt.pipeline(
    "github_demo",
    destination="duckdb",
).run(source=repos())

One decorator, one function, a typed table in DuckDB. dlt infers the schema, types the columns, and writes the table. No configuration files, no migrations.

Text

@dlt.resource(
    primary_key="user_id",
    write_disposition="merge",
    columns={"email": {"anonymize": True}},
    parallelized=True,
)
def stargazers():
    yield from fetch_stargazers()

Multiple resources can be grouped into a source — useful when you’re pulling several tables from the same API:

Text

@dlt.source
def github_stars():
    return stargazers(), repos()

dlt.pipeline("github_stars", destination="duckdb").run(github_stars())

REST APIs: imperative and declarative

Text

github = rest_api_resources({
    "client": {
        "base_url": "https://api.github.com",
    },
    "resources": [
        "/repos/dlt-hub/dlt/issues",
    ],
    "resource_defaults": {
        "endpoint": {
            "params": {"per_page": 100, "state": "all"}
        }
    }
})

Running it

Full load result: 1,807 starred repos, 11,821 topic rows, 0 failed jobs — fetched via Link-header pagination that the agent wired in automatically.

Here’s what dlt handled without any custom code:

Schema inference — Python dicts and REST API responses became typed tables automatically
Normalization — nested arrays (like topics) became a separate child table with proper foreign keys
Incremental loading — one decorator argument away
Secrets — dlt.secrets.value and dlt.config.value handle credentials cleanly across local and cloud environments
Dataset API — once loaded, data is reachable as pandas, Arrow, or ibis — same API across DuckDB, Snowflake, BigQuery, Iceberg, and Filesystem
Portability — switching the destination is one line of code

As Elvis put it: “No custom glue — no migrations — no ‘build a platform.’”

Validating with Marimo

Once the pipeline ran, Elvis attached a Marimo notebook directly to it:

Text

pipeline = dlt.attach(
    pipeline_name="github_stars_etl",
    destination="filesystem",
    dataset_name="github_stars",
)

pipeline.dataset().repos_with_stars.select("repo_name", "star_count").arrow()

The result was a live, interactive table of all 548 repositories Elvis had starred, queryable and explorable locally before promoting anything to a production destination.

From 2 weeks to 20 minutes

Want to build this yourself? The Agentic Data Engineering with dltHub course takes you from a single prompt to a production-grade pipeline you can trust.

What happens next: Bauplan

Once the GitHub stars data landed in S3 as Parquet, Ciro took over and showed what safe agentic iteration looks like on the other side of the pipeline.

Try it yourself

dlt is open source and free to get started:

Text

pip install dlt
dlt ai init --agent=claude
dlt ai toolkit rest-api-pipeline install

Want to bring this into your stack? Talk to the team.

Setting up dlt for agents: dlt ai initLink icon

Building a pipeline with Claude CodeLink icon

REST APIs: imperative and declarativeLink icon

Running itLink icon

Validating with MarimoLink icon

From 2 weeks to 20 minutesLink icon

What happens next: BauplanLink icon

Try it yourselfLink icon

Setting up dlt for agents: dlt ai initLink icon

Building a pipeline with Claude CodeLink icon

REST APIs: imperative and declarativeLink icon

Running itLink icon

Validating with MarimoLink icon

From 2 weeks to 20 minutesLink icon

What happens next: BauplanLink icon

Try it yourselfLink icon

Setting up dlt for agents: `dlt ai init`

Building a pipeline with Claude Code

REST APIs: imperative and declarative

Running it

Validating with Marimo

From 2 weeks to 20 minutes

What happens next: Bauplan

Try it yourself

Setting up dlt for agents: `dlt ai init`

Building a pipeline with Claude Code

REST APIs: imperative and declarative

Running it

Validating with Marimo

From 2 weeks to 20 minutes

What happens next: Bauplan

Try it yourself