Blog/June 1, 2026/

Engineering,
Product

AI Workbench: Data quality toolkit preview

The agent wrote the pipeline, but assumptions slip silently: nulls in primary keys, duplicates from the wrong write disposition, drifted enum values. The dltHub AI Workbench data quality toolkit bootstraps checks from dlt's existing schema, samples columns before any rule ships, writes the checks into your pipeline as decorators, and routes failures back to the toolkit that owns the surface area: ingestion, transformations, or exploration.

Hiba Jamal,
Junior Data & AI Manager

On this page

Data quality to the rescue, but not YOLO
What you actually want, in a toolkit
Bootstrapping from what dlt already knows
From schema validation to business meaning
What lands in your pipeline
What it catches
Most DQ tools are the lab result. This one is the medical system
Try it

You prompt, the agent writes the pipeline: endpoints, pagination, incremental loading, schema normalization, all of it just appears.

Over the next weeks your stakeholders start reporting issues. Someone notices the customers table has duplicates. Or amounts went negative. Or a column the business depends on was quietly 30% null.

It all boils down to validation of assumptions, and perhaps a few small mistakes.

Data often has business logic enforced on it in the applications that produce it, but that logic isn't sent downstream with the data, so we make assumptions and write code based on them.

The initial assumptions may or may not be correct, and the code implements them. Over time those assumptions can also change — sources change some logic, for example — leading to an accumulation of data quality defects.

Data quality to the rescue, but not YOLO

Ask Claude or Cursor to "add data quality" to a pipeline you've already loaded. Watch what it produces.

We ran the experiment. The agent does what an unsupervised intern does: opens a Marimo notebook, writes a checks dictionary, runs a handful of ad-hoc SQL queries, reports back. It's a one-off audit. Nothing persists. Nothing re-runs. The next time someone asks, the agent starts over.

What you actually want, in a toolkit

One-offs have their role, but you usually want to validate assumptions on an ongoing basis to catch them when they change.

Question	What the toolkit produces
"I want checks every time the pipeline loads."	Decorators on your resources. Runs as part of `pipeline.run()`.
"Data is already loaded. I just want to check what's there."	A standalone audit script using `dq.run_checks(...)`.

Bootstrapping from what dlt already knows

dlt's schema already tracks the assumptions made during ingestion: which columns are primary keys, which are non-nullable, which have uniqueness hints. Those aren't decoration — the loader actually uses them.

The toolkit reads them and proposes checks before asking you anything:

Text

orders:
  → is_unique("id")            [id is marked primary_key in schema]
  → is_not_null("customer_id") [customer_id is non-nullable]

customers:
  → is_unique("id")
  → is_not_null("email")

The schema is the floor, not the ceiling. Transformations and calculations also contain semantic rules you'd want to test for.

From schema validation to business meaning

The ceiling is the rules your business logic depends on but never wrote down: status must be one of three values, amount after discount must be non-negative, an email must look like an email. You state those in plain language. The toolkit maps them to is_in(), case(), is_not_null().

Four primitives cover most of it: is_unique, is_not_null, is_in, case — with column-level metrics like null_rate, mean, and row_count recording values over time so you can spot drift.

The next interesting step is sampling before any rule ships. You say status should be active or inactive. The toolkit samples the column and comes back:

Sampled status in orders — found values: active, inactive, pending, cancelled. Your stated set was ["active", "inactive"]. Should I include "pending" and "cancelled"?

The agent confirms the full set of checks with you before writing a single line of code. Every check is explicit, visible, and approved. That way you don't miss something important due to a typo or a value you didn't know existed.

What lands in your pipeline

Here's what the per-load mode actually writes:

Python

from dlt.hub import data_quality as dq

@dq.with_checks(
    dq.checks.is_unique("id"),
    dq.checks.is_not_null("customer_id"),
    dq.checks.case("amount >= 0"),
)
@dq.with_metrics(
    dq.metrics.table.row_count(),
    dq.metrics.column.null_rate("customer_id"),
    dq.metrics.column.mean("amount"),
)
@dlt.resource
def orders():
    yield from fetch_orders()

One call — dq.enable_data_quality(pipeline) — flips a flag on the pipeline state. From then on, every pipeline.run() runs the checks and writes results into _dlt_checks and _dlt_dq_metrics in the destination. They're tables. You query them, dashboard them, or alert on them with whatever you already use.

What it catches

Two examples from experiments we ran:

is_not_null on customer_id in an orders table. The column was null 50% of the time and the pipeline loaded it without complaint because nothing told it not to. The canonical data model that the transformations toolkit builds expected customer_id to join orders to customers. Half the joins would have been silently wrong. The check caught it before the model was queried.
is_unique on what was meant to be a primary key. Duplicates everywhere. The data at the source was fine, but the write disposition on the resource was append instead of merge, so every load re-inserted the same rows. The check flagged a column. The fix lived in the ingestion code.

Most data quality failures are like this — a symptom of an incorrect assumption or a small mistake.

Most DQ tools are the lab result. This one is the medical system

Most data quality tools are the lab result: "your knee's broken." The dltHub toolkit is the medical system: detection, diagnosis, and fix in the same doctor visit.

Take the customer_id failure from earlier. With a tool like Great Expectations, the flow is:

Someone spends time figuring out what should be tested.
Someone writes an expect_column_values_to_not_be_null expectation in a separate suite.
A scheduled job runs it after the pipeline loads.
The job alerts: "50% of customer_id is null."
Someone reads the alert. Source problem? Ingestion config? Modeling assumption that was wrong from the start? They open the pipeline. They open the model. They check the source. They file a ticket.
Eventually, someone fixes it.

In our case, agentic context replaces the human tribal knowledge and bottlenecks, enabling efficient identification and resolution of errors. When the toolkit identifies an issue, the LLM finds the conflicting code assumption and routes to the appropriate toolkit that owns that surface area.

The referral depends on the failure pattern:

Failure pattern	Routed to
Ingestion is wrong (write disposition, schema)	`rest-api-pipeline`
Modeling is wrong (joins, canonical fields)	`transformations`
Anomaly worth a closer look	`data-exploration`
Everything passes	`dlthub-platform`, to schedule

In the earlier example with the null customer_id, the agent would kick off the transformations toolkit to rearchitect the place of the customer entity in the canonical model.

Try it

The data quality toolkit and the check framework are part of the dltHub offering (free trial, no card required, transparent pricing). To get started, install dlt with hub support and initialize the workbench:

Shell

uv pip install "dlt[hub]"
uv run dlt ai init
uv run dlt ai toolkit data-quality install

Or if you're already in a Claude Code session:

Shell

/plugin marketplace add dlt-hub/dlthub-ai-workbench
/plugin install data-quality@dlthub-ai-workbench --scope project

Then ask your assistant to set up data quality on a pipeline you've already loaded. The four skills walk in order from there, and the toolkit hands off to rest-api-pipeline, transformations, data-exploration, or dlthub-platform when the failures point somewhere upstream.

The full workbench includes toolkits for REST API ingestion, ontology-driven transformations, data exploration, and production deployment too — so you can go from raw API to validated, deployed pipeline without leaving your editor.

Blog/June 1, 2026/

Engineering,
Product

AI Workbench: Data quality toolkit preview

Hiba Jamal,
Junior Data & AI Manager

On this page

Data quality to the rescue, but not YOLO
What you actually want, in a toolkit
Bootstrapping from what dlt already knows
From schema validation to business meaning
What lands in your pipeline
What it catches
Most DQ tools are the lab result. This one is the medical system
Try it

You prompt, the agent writes the pipeline: endpoints, pagination, incremental loading, schema normalization, all of it just appears.

Over the next weeks your stakeholders start reporting issues. Someone notices the customers table has duplicates. Or amounts went negative. Or a column the business depends on was quietly 30% null.

It all boils down to validation of assumptions, and perhaps a few small mistakes.

Data often has business logic enforced on it in the applications that produce it, but that logic isn't sent downstream with the data, so we make assumptions and write code based on them.

Data quality to the rescue, but not YOLO

Ask Claude or Cursor to "add data quality" to a pipeline you've already loaded. Watch what it produces.

What you actually want, in a toolkit

One-offs have their role, but you usually want to validate assumptions on an ongoing basis to catch them when they change.

Question	What the toolkit produces
"I want checks every time the pipeline loads."	Decorators on your resources. Runs as part of `pipeline.run()`.
"Data is already loaded. I just want to check what's there."	A standalone audit script using `dq.run_checks(...)`.

Bootstrapping from what dlt already knows

The toolkit reads them and proposes checks before asking you anything:

Text

orders:
  → is_unique("id")            [id is marked primary_key in schema]
  → is_not_null("customer_id") [customer_id is non-nullable]

customers:
  → is_unique("id")
  → is_not_null("email")

The schema is the floor, not the ceiling. Transformations and calculations also contain semantic rules you'd want to test for.

From schema validation to business meaning

Four primitives cover most of it: is_unique, is_not_null, is_in, case — with column-level metrics like null_rate, mean, and row_count recording values over time so you can spot drift.

The next interesting step is sampling before any rule ships. You say status should be active or inactive. The toolkit samples the column and comes back:

Sampled status in orders — found values: active, inactive, pending, cancelled. Your stated set was ["active", "inactive"]. Should I include "pending" and "cancelled"?

What lands in your pipeline

Here's what the per-load mode actually writes:

Python

from dlt.hub import data_quality as dq

@dq.with_checks(
    dq.checks.is_unique("id"),
    dq.checks.is_not_null("customer_id"),
    dq.checks.case("amount >= 0"),
)
@dq.with_metrics(
    dq.metrics.table.row_count(),
    dq.metrics.column.null_rate("customer_id"),
    dq.metrics.column.mean("amount"),
)
@dlt.resource
def orders():
    yield from fetch_orders()

What it catches

Two examples from experiments we ran:

is_not_null on customer_id in an orders table. The column was null 50% of the time and the pipeline loaded it without complaint because nothing told it not to. The canonical data model that the transformations toolkit builds expected customer_id to join orders to customers. Half the joins would have been silently wrong. The check caught it before the model was queried.
is_unique on what was meant to be a primary key. Duplicates everywhere. The data at the source was fine, but the write disposition on the resource was append instead of merge, so every load re-inserted the same rows. The check flagged a column. The fix lived in the ingestion code.

Most data quality failures are like this — a symptom of an incorrect assumption or a small mistake.

Most DQ tools are the lab result. This one is the medical system

Most data quality tools are the lab result: "your knee's broken." The dltHub toolkit is the medical system: detection, diagnosis, and fix in the same doctor visit.

Take the customer_id failure from earlier. With a tool like Great Expectations, the flow is:

Someone spends time figuring out what should be tested.
Someone writes an expect_column_values_to_not_be_null expectation in a separate suite.
A scheduled job runs it after the pipeline loads.
The job alerts: "50% of customer_id is null."
Someone reads the alert. Source problem? Ingestion config? Modeling assumption that was wrong from the start? They open the pipeline. They open the model. They check the source. They file a ticket.
Eventually, someone fixes it.

The referral depends on the failure pattern:

Failure pattern	Routed to
Ingestion is wrong (write disposition, schema)	`rest-api-pipeline`
Modeling is wrong (joins, canonical fields)	`transformations`
Anomaly worth a closer look	`data-exploration`
Everything passes	`dlthub-platform`, to schedule

In the earlier example with the null customer_id, the agent would kick off the transformations toolkit to rearchitect the place of the customer entity in the canonical model.

Try it

Shell

uv pip install "dlt[hub]"
uv run dlt ai init
uv run dlt ai toolkit data-quality install

Or if you're already in a Claude Code session:

Shell

/plugin marketplace add dlt-hub/dlthub-ai-workbench
/plugin install data-quality@dlthub-ai-workbench --scope project

Data quality to the rescue, but not YOLOLink icon

What you actually want, in a toolkitLink icon

Bootstrapping from what dlt already knowsLink icon

From schema validation to business meaningLink icon

What lands in your pipelineLink icon

What it catchesLink icon

Most DQ tools are the lab result. This one is the medical systemLink icon

Try itLink icon

Data quality to the rescue, but not YOLOLink icon

What you actually want, in a toolkitLink icon

Bootstrapping from what dlt already knowsLink icon

From schema validation to business meaningLink icon

What lands in your pipelineLink icon

What it catchesLink icon

Most DQ tools are the lab result. This one is the medical systemLink icon

Try itLink icon

Data quality to the rescue, but not YOLO

What you actually want, in a toolkit

Bootstrapping from what dlt already knows

From schema validation to business meaning

What lands in your pipeline

What it catches

Most DQ tools are the lab result. This one is the medical system

Try it

Data quality to the rescue, but not YOLO

What you actually want, in a toolkit

Bootstrapping from what dlt already knows

From schema validation to business meaning

What lands in your pipeline

What it catches

Most DQ tools are the lab result. This one is the medical system

Try it