Blog

May 12, 2026·Product,Tutorials,Engineering

From prompt to production: a free course on agentic data engineering

With 91% of dlt pipelines AI-written, learn Agentic Data Engineering in this free 1-hour course.

Adrian Brudaru

Read

May 5, 2026·Product,Engineering,Industry

Ontology engineering: what it is, why it's back, and why agents need it

Agents don't hallucinate. They navigate without a map. Ontology engineering is how you build one, and why every team pulling humans out of the loop needs it now.

Adrian Brudaru

Read

April 24, 2026·Product,Community,Tutorials

I tracked the Iran-USA conflict, oil prices, and Bitcoin — without a data team

The dltHub AI Workbench gives Claude Code a structured workflow for building data pipelines. We put it to the test with a real geopolitical question.

Roshni Melwani

Read

April 24, 2026·Tutorials

Operational Health: Schema update detection with dlt

dlt handles schema evolution efficiently but silently. Here's how to read dlt's metadata and be informed of what's shifting in your pipeline.

Aman Gupta

Read

April 23, 2026·Tutorials,Product

Operational Health: Auditing data freshness with dlt metadata

A "Success" exit code only tells you the pipeline ran. Use `load_id` to join `_dlt_loads` with your source table and check if the data is actually fresh.

Aman Gupta

Read

April 21, 2026·Community,Product

Who maintains the skill layer?

We're in an LLM-coding junior bubble. "It runs" isn't the senior bar. Lifecycle rigor and dependency management are.

Adrian Brudaru

Read

April 14, 2026·Product

Agentic toolkit eval: dltHub REST API toolkit

The dlt AI Workbench transforms AI-generated "vibe coding" from an unmanaged process full of hidden risks into a mature engineering workflow that prioritizes security, current documentation, and persistent state by default.

Adrian Brudaru

Read

April 14, 2026·Product,Community

Agentic guardrails: The next layer in agentic engineering

Part of the [dltHub AI Workbench series](https://dlthub.com/blog/ai-workbench)

Adrian Brudaru

Read

April 14, 2026·Tutorials

dltHub Pro + Cortex Code: Two agents, better together

TL;DR: Cortex Code helps you work with data already in Snowflake. dltHub Pro gets data into Snowflake from any source, especially the ones no ETL tool covers. They operate at different layers of the stack and they are designed to hand off to each other.

Adrian Brudaru

Read

April 8, 2026·Product

What's the Minimum Viable Context for Building a Canonical Data Model with an LLM?

Call it the MVC problem: minimum viable context. Too little and it hallucinates your domain. Too much and it drifts from your actual goal. The process has to be controlled.

Hiba Jamal

Read

March 24, 2026·Product

dltHub AI Workbench: Ontology driven data modelling toolkit preview

How are LLMs supposed to know the business logic of how you use Hubspot, Luma and Slack together? How are they supposed to know what a customer means to you?

Hiba Jamal

Read

March 23, 2026·Product

Agents can write dlt pipelines. Now they can run & deploy them.

Today we are introducing the dltHub AI Workbench: an infrastructure layer for dltHub that makes AI-generated dlt pipelines trustworthy enough to run and deploy in production.

Matthaus Krzykowski

Read

March 17, 2026·Tutorials

How to protect PII with dlt and Pydantic

Stop PII leaks before they hit your warehouse. By using dlt and Pydantic to enforce data contracts, you can sanitize or quarantine sensitive fields the moment they’re ingested.

Aman Gupta

Read

March 10, 2026·Product

So you vibe coded a data stack, now what?

In this blog post, I will describe the actual, hard real world barriers that make your LLM setup collapse, and propose principles for making your systems work.

Adrian Brudaru

Read

March 10, 2026·Product,Tutorials,Community

Building production-ready data pipelines in Microsoft Fabric: A complete data quality framework with dlthub

Add data quality gates to Microsoft Fabric with dlt. Validate schemas, catch bad records, and mask PII before data reaches your lakehouse and downstream analytics.

Rakesh Gupta

Read

March 9, 2026·Tutorials,Engineering

Your Traces Aren't Training Data Yet. Here's the Pipeline That Makes Them.

Production traces are scattered across databases, log aggregators, and storage buckets, and most of them aren't clean (input, output) pairs you can hand to a training job. This walkthrough shows how to build a dlt pipeline that extracts traces from any source, transforms them into structured conversation formats, and lands them as versioned Parquet on Hugging Face, ready for Distil Labs to generate synthetic training data and deliver a specialist model that beats the LLM you're running today.

Alena Astrakhantseva +1

Read

March 9, 2026·Product

Hugging Face x dltHub: The missing data layer for ML practitioners

From raw data to production ML: load, transform, embed, and publish curated datasets with declarative pipelines powered by dltHub.

Elvis Kahoro +2

Read

March 3, 2026·Engineering

Testing Before Loading: WAP and AWAP

Single-gate validation fails to decouple row-level syntax from batch-level semantics. Evolve from WAP to the AWAP protocol with this simple dlt tutorial to stop pipeline corruption at the source.

Roshni Melwani

Read

February 25, 2026·Product

Ontology driven Dimensional Modeling

Trying to force an LLM to reconstruct the 'world' using only a semantic layer is like trying to turn cheese back into milk. The information required to understand the original system was stripped away during the modeling process.

Adrian Brudaru

Read

February 19, 2026·Community

Memory for AI Agents: Understanding Modeling for Unstructured Data

For the more classic data engineering crowd, here’s an explainer of how unstructured AI memory works, though the lens of what we know from working with structured data.

Adrian Brudaru

Read

February 17, 2026·Product

Debugging Our Docs RAG, Part 2: Testing New Generation Models

By upgrading only the generative model, we achieved a 3x accuracy boost but hit a hard ceiling, proving that not only LLMs are needed for good retrieval.

Aashish Nair

Read

February 13, 2026·Tutorials

LLM-native EL workshop on Data Talks Club

Remus Molnar

Read

February 10, 2026·Product,Engineering

The Last Mile is Solved by Slop

I didn't vibe-build a product. I wrote a messy scaffold that runs a pipeline, grabs the schema, and forces an agent to build a star schema. It works shockingly well.

Adrian Brudaru

Read

Blog

Exploring schema evolution with ontology-driven propagation

From prompt to production: a free course on agentic data engineering

Ontology engineering: what it is, why it's back, and why agents need it

I tracked the Iran-USA conflict, oil prices, and Bitcoin — without a data team

Operational Health: Schema update detection with dlt

Operational Health: Auditing data freshness with dlt metadata

Who maintains the skill layer?

Agentic toolkit eval: dltHub REST API toolkit

Agentic guardrails: The next layer in agentic engineering

dltHub Pro + Cortex Code: Two agents, better together

What's the Minimum Viable Context for Building a Canonical Data Model with an LLM?

dltHub AI Workbench: Ontology driven data modelling toolkit preview

Agents can write dlt pipelines. Now they can run & deploy them.

How to protect PII with dlt and Pydantic

So you vibe coded a data stack, now what?

Building production-ready data pipelines in Microsoft Fabric: A complete data quality framework with dlthub

Your Traces Aren't Training Data Yet. Here's the Pipeline That Makes Them.

Hugging Face x dltHub: The missing data layer for ML practitioners

Testing Before Loading: WAP and AWAP

Ontology driven Dimensional Modeling

Memory for AI Agents: Understanding Modeling for Unstructured Data

Debugging Our Docs RAG, Part 2: Testing New Generation Models

LLM-native EL workshop on Data Talks Club

The Last Mile is Solved by Slop