Blog//
AI Workbench: Data quality toolkit preview
The agent wrote the pipeline, but assumptions slip silently: nulls in primary keys, duplicates from the wrong write disposition, drifted enum values. The dltHub AI Workbench data quality toolkit bootstraps checks from dlt's existing schema, samples columns before any rule ships, writes the checks into your pipeline as decorators, and routes failures back to the toolkit that owns the surface area: ingestion, transformations, or exploration.
Hiba Jamal,
Junior Data & AI Manager
You prompt, the agent writes the pipeline: endpoints, pagination, incremental loading, schema normalization, all of it just appears.
Over the next weeks your stakeholders start reporting issues. Someone notices the customers table has duplicates. Or amounts went negative. Or a column the business depends on was quietly 30% null.
It all boils down to validation of assumptions, and perhaps a few small mistakes.
Data often has business logic enforced on it in the applications that produce it, but that logic isn't sent downstream with the data, so we make assumptions and write code based on them.
The initial assumptions may or may not be correct, and the code implements them. Over time those assumptions can also change — sources change some logic, for example — leading to an accumulation of data quality defects.
Data quality to the rescue, but not YOLO
Ask Claude or Cursor to "add data quality" to a pipeline you've already loaded. Watch what it produces.
We ran the experiment. The agent does what an unsupervised intern does: opens a Marimo notebook, writes a checks dictionary, runs a handful of ad-hoc SQL queries, reports back. It's a one-off audit. Nothing persists. Nothing re-runs. The next time someone asks, the agent starts over.
What you actually want, in a toolkit
One-offs have their role, but you usually want to validate assumptions on an ongoing basis to catch them when they change.
| Question | What the toolkit produces |
|---|---|
| "I want checks every time the pipeline loads." | Decorators on your resources. Runs as part of pipeline.run(). |
| "Data is already loaded. I just want to check what's there." | A standalone audit script using dq.run_checks(...). |
Bootstrapping from what dlt already knows
dlt's schema already tracks the assumptions made during ingestion: which columns are primary keys, which are non-nullable, which have uniqueness hints. Those aren't decoration — the loader actually uses them.
The toolkit reads them and proposes checks before asking you anything:
orders:
→ is_unique("id") [id is marked primary_key in schema]
→ is_not_null("customer_id") [customer_id is non-nullable]
customers:
→ is_unique("id")
→ is_not_null("email")The schema is the floor, not the ceiling. Transformations and calculations also contain semantic rules you'd want to test for.
From schema validation to business meaning
The ceiling is the rules your business logic depends on but never wrote down: status must be one of three values, amount after discount must be non-negative, an email must look like an email. You state those in plain language. The toolkit maps them to is_in(), case(), is_not_null().
Four primitives cover most of it: is_unique, is_not_null, is_in, case — with column-level metrics like null_rate, mean, and row_count recording values over time so you can spot drift.
The next interesting step is sampling before any rule ships. You say status should be active or inactive. The toolkit samples the column and comes back:
Sampledstatusinorders— found values: active, inactive, pending, cancelled. Your stated set was ["active", "inactive"]. Should I include "pending" and "cancelled"?
The agent confirms the full set of checks with you before writing a single line of code. Every check is explicit, visible, and approved. That way you don't miss something important due to a typo or a value you didn't know existed.

What lands in your pipeline
Here's what the per-load mode actually writes:
from dlt.hub import data_quality as dq
@dq.with_checks(
dq.checks.is_unique("id"),
dq.checks.is_not_null("customer_id"),
dq.checks.case("amount >= 0"),
)
@dq.with_metrics(
dq.metrics.table.row_count(),
dq.metrics.column.null_rate("customer_id"),
dq.metrics.column.mean("amount"),
)
@dlt.resource
def orders():
yield from fetch_orders()One call — dq.enable_data_quality(pipeline) — flips a flag on the pipeline state. From then on, every pipeline.run() runs the checks and writes results into _dlt_checks and _dlt_dq_metrics in the destination. They're tables. You query them, dashboard them, or alert on them with whatever you already use.
What it catches
Two examples from experiments we ran:
is_not_nulloncustomer_idin an orders table. The column was null 50% of the time and the pipeline loaded it without complaint because nothing told it not to. The canonical data model that the transformations toolkit builds expectedcustomer_idto join orders to customers. Half the joins would have been silently wrong. The check caught it before the model was queried.is_uniqueon what was meant to be a primary key. Duplicates everywhere. The data at the source was fine, but the write disposition on the resource wasappendinstead ofmerge, so every load re-inserted the same rows. The check flagged a column. The fix lived in the ingestion code.
Most data quality failures are like this — a symptom of an incorrect assumption or a small mistake.
Most DQ tools are the lab result. This one is the medical system
Most data quality tools are the lab result: "your knee's broken." The dltHub toolkit is the medical system: detection, diagnosis, and fix in the same doctor visit.
Take the customer_id failure from earlier. With a tool like Great Expectations, the flow is:
- Someone spends time figuring out what should be tested.
- Someone writes an
expect_column_values_to_not_be_nullexpectation in a separate suite. - A scheduled job runs it after the pipeline loads.
- The job alerts: "50% of
customer_idis null." - Someone reads the alert. Source problem? Ingestion config? Modeling assumption that was wrong from the start? They open the pipeline. They open the model. They check the source. They file a ticket.
- Eventually, someone fixes it.
In our case, agentic context replaces the human tribal knowledge and bottlenecks, enabling efficient identification and resolution of errors. When the toolkit identifies an issue, the LLM finds the conflicting code assumption and routes to the appropriate toolkit that owns that surface area.
The referral depends on the failure pattern:
| Failure pattern | Routed to |
|---|---|
| Ingestion is wrong (write disposition, schema) | rest-api-pipeline |
| Modeling is wrong (joins, canonical fields) | transformations |
| Anomaly worth a closer look | data-exploration |
| Everything passes | dlthub-platform, to schedule |
In the earlier example with the null customer_id, the agent would kick off the transformations toolkit to rearchitect the place of the customer entity in the canonical model.
Try it
The data quality toolkit and the check framework are part of the dltHub Pro offering (free trial, no card required, transparent pricing). To get started, install dlt with hub support and initialize the workbench:
uv pip install "dlt[hub]"
uv run dlt ai init
uv run dlt ai toolkit data-quality installOr if you're already in a Claude Code session:
/plugin marketplace add dlt-hub/dlthub-ai-workbench
/plugin install data-quality@dlthub-ai-workbench --scope projectThen ask your assistant to set up data quality on a pipeline you've already loaded. The four skills walk in order from there, and the toolkit hands off to rest-api-pipeline, transformations, data-exploration, or dlthub-platform when the failures point somewhere upstream.
The full workbench includes toolkits for REST API ingestion, ontology-driven transformations, data exploration, and production deployment too — so you can go from raw API to validated, deployed pipeline without leaving your editor.