
For years, the thing that held a data pipeline together end to end wasn't a tool. It was you. You were the context layer.

Adrian Brudaru

Agents now write the pipelines, models, and dashboards. What they can't write is what your data means. Meet the data role that's emerging: the semantic engineer.

Adrian Brudaru

Text-to-SQL doesn’t break because models can’t write SQL — it breaks because they don’t know what your data means. Write the meaning down first as a canonical knowledge layer, and use that one spec to both build the model and answer questions over it.

Adrian Brudaru

Schema alone scored 3/10. An ontology scored 10/10. A benchmark across two datasets showing exactly where the gap is, including cases where the model gets the right answer for the wrong reason.

Roshni Melwani

Schema evolution is a decision every data pipeline makes — most tools make it silently. This post discusses the five common failure modes every data pipeline sees, how dlt handles them, and how you can decide runtime policies for schema evolution with data contracts.

Aman Gupta

At Snowflake Summit 2026, dltHub was named Snowflake’s 2026 Startup Program Product Partner of the Year for helping more than 1,000 organizations bring hard-to-reach data into the Snowflake AI Data Cloud with Python-native, AI-driven pipelines.

Matthaus Krzykowski

You pay for compute hours; what you actually want is data moved. This post measures the exchange rate across the four bottlenecks that dominate real pipelines: SQL copy, REST APIs, JSON files, and Parquet

Aman Gupta

The agent wrote the pipeline, but assumptions slip silently: nulls in primary keys, duplicates from the wrong write disposition, drifted enum values. The dltHub AI Workbench data quality toolkit bootstraps checks from dlt's existing schema, samples columns before any rule ships, writes the checks into your pipeline as decorators, and routes failures back to the toolkit that owns the surface area: ingestion, transformations, or exploration.

Hiba Jamal
In public preview today as part of dltHub Pro. dltHub Transformations turns raw data into the clean tables your business and your agents actually use. Built for a moment when agents now write 9 out of 10 data pipelines.

Matthaus Krzykowski

Today's stacks split ingestion, transformation, orchestration, and the context that agents need gets lost at every boundary. dltHub Transformations runs ingestion, transformation, lineage, and verification inside the same execution context, so an LLM can reason about your business with the context a senior analyst would have.

Adrian Brudaru

One working student, Claude Code, one stakeholder call, and 2 weeks. The migration worked but the workflow we used is the actual point of this post. AI alone wouldn’t have gotten us there.

Nikolas Jack Altran

Generally available today. 91% of new dlt pipelines are now built by agents. dltHub Pro makes building and running them production-grade for any Python developer.

Matthaus Krzykowski

Write your access policy as a plain-English ontology. Schema evolves; the LLM reads the rules and decides.

Aman Gupta

With 91% of dlt pipelines AI-written, learn Agentic Data Engineering in this free 1-hour course.

Adrian Brudaru

AI agents can write data pipelines. The part that isn't ready is everything around them — isolation, rollbacks, safe promotion to prod. This demo shows what a stack built for agents actually looks like.

Elvis Kahoro

Agents don't hallucinate. They navigate without a map. Ontology engineering is how you build one, and why every team pulling humans out of the loop needs it now.

Adrian Brudaru

The dltHub AI Workbench gives Claude Code a structured workflow for building data pipelines. We put it to the test with a real geopolitical question.

Roshni Melwani
dlt handles schema evolution efficiently but silently. Here's how to read dlt's metadata and be informed of what's shifting in your pipeline.

Aman Gupta

A "Success" exit code only tells you the pipeline ran. Use `load_id` to join `_dlt_loads` with your source table and check if the data is actually fresh.

Aman Gupta

We're in an LLM-coding junior bubble. "It runs" isn't the senior bar. Lifecycle rigor and dependency management are.

Adrian Brudaru

The dlt AI Workbench transforms AI-generated "vibe coding" from an unmanaged process full of hidden risks into a mature engineering workflow that prioritizes security, current documentation, and persistent state by default.

Adrian Brudaru

Part of the [dltHub AI Workbench series](https://dlthub.com/blog/ai-workbench)

Adrian Brudaru

TL;DR: Cortex Code helps you work with data already in Snowflake. dltHub Pro gets data into Snowflake from any source, especially the ones no ETL tool covers. They operate at different layers of the stack and they are designed to hand off to each other.

Adrian Brudaru

Call it the MVC problem: minimum viable context. Too little and it hallucinates your domain. Too much and it drifts from your actual goal. The process has to be controlled.

Hiba Jamal

How are LLMs supposed to know the business logic of how you use Hubspot, Luma and Slack together? How are they supposed to know what a customer means to you?

Hiba Jamal

Today we are introducing the dltHub AI Workbench: an infrastructure layer for dltHub that makes AI-generated dlt pipelines trustworthy enough to run and deploy in production.

Matthaus Krzykowski

Stop PII leaks before they hit your warehouse. By using dlt and Pydantic to enforce data contracts, you can sanitize or quarantine sensitive fields the moment they’re ingested.

Aman Gupta

In this blog post, I will describe the actual, hard real world barriers that make your LLM setup collapse, and propose principles for making your systems work.

Adrian Brudaru

Add data quality gates to Microsoft Fabric with dlt. Validate schemas, catch bad records, and mask PII before data reaches your lakehouse and downstream analytics.

Rakesh Gupta
Production traces are scattered across databases, log aggregators, and storage buckets, and most of them aren't clean (input, output) pairs you can hand to a training job. This walkthrough shows how to build a dlt pipeline that extracts traces from any source, transforms them into structured conversation formats, and lands them as versioned Parquet on Hugging Face, ready for Distil Labs to generate synthetic training data and deliver a specialist model that beats the LLM you're running today.

Alena Astrakhantseva +1

From raw data to production ML: load, transform, embed, and publish curated datasets with declarative pipelines powered by dltHub.

Elvis Kahoro +2

Single-gate validation fails to decouple row-level syntax from batch-level semantics. Evolve from WAP to the AWAP protocol with this simple dlt tutorial to stop pipeline corruption at the source.

Roshni Melwani

Trying to force an LLM to reconstruct the 'world' using only a semantic layer is like trying to turn cheese back into milk. The information required to understand the original system was stripped away during the modeling process.

Adrian Brudaru

For the more classic data engineering crowd, here’s an explainer of how unstructured AI memory works, though the lens of what we know from working with structured data.

Adrian Brudaru

By upgrading only the generative model, we achieved a 3x accuracy boost but hit a hard ceiling, proving that not only LLMs are needed for good retrieval.

Aashish Nair


Remus Molnar

I didn't vibe-build a product. I wrote a messy scaffold that runs a pipeline, grabs the schema, and forces an agent to build a star schema. It works shockingly well.

Adrian Brudaru

Analyzing UFC greatness by building a full stack (dlt, dbt, Metabase) to transform raw fight stats into a data-driven search for the true GOAT.

Reshef Sharvit

Moved 5M rows from DuckDB to MySQL 3.7x faster, reducing time from 344s to 92s by switching from SQLAlchemy’s row-by-row path to Arrow + ADBC’s columnar pipeline.

Aman Gupta

We were told that democratization meant 'safety,' but all we got were expensive cages. The era of the SaaS hostage is ending; the era of the sovereign Builder has begun.

Adrian Brudaru

The “data is oil” era is over. With LLMs, data is plutonium: powerful, toxic. Shift left and secure the reactor with 5 quality pillars.

Adrian Brudaru

Our docs RAG was failing quietly. We stopped guessing and built a real-user evaluation: the first baseline we could actually measure and improve.

Aashish Nair


Adrian Brudaru

11 practical, copy-paste data quality recipes for dlt. From schema freezes to alerts, learn how to keep pipelines clean, safe, and production-ready

Aman Gupta

Start local with DuckLake, validate your data, then deploy to MotherDuck in minutes. Same pipeline, same code, just switch the destination.

Aman Gupta

Data contracts keep systems predictable by pairing clear rules with checks that catch bad data before it flows downstream.

Adrian Brudaru

Most LLM runs don’t fail. They converge fast, and the secret isn’t smarter models but better scaffolds that guide the work instead of forcing it.

Adrian Brudaru

Openflow and dltHub represent two distinct but valuable visions for the future of data ingestion.

Adrian Brudaru

This is, we’re told, the great democratization of data engineering. The tedious work is gone. The barrier to entry is gone. Everyone can now be a data engineer.

Adrian Brudaru

MotherDuck lands in Europe with serverless DuckDB warehousing. dlt adds DuckLake support, giving EU teams a fast, modern data stack.

Adrian Brudaru

SAP data is hard to extract. Dominik’s new Python connector replaces pyRFC, enabling faster, chunked ingestion into modern pipelines.

Mateusz Paździor

LLM leaders agree: the true win is "scaled mediocrity." We're empowering teams with good enough tools for massive, real-world impact.

Adrian Brudaru

For quick tasks, df.to_sql() is perfect. But for production pipelines, it quickly shows its limits when data volume, frequency, and schema change.

Adrian Brudaru

Learn how dlt automates SCD2 for nested JSON data without complex SQL headaches. Real BigQuery benchmarks show incremental loading cuts costs by 25-35%.

Aman Gupta

Emmanuel built a slim framework on top of dlt that levels up the vanilla Kafka source into a production-ready setup. Check it out 🚀

Aman Gupta

You want connectors, and you want them to be many, high quality and customisable? A man can dream? here’s our roadmap to making those dreams a reality, and how you can help us today.

Adrian Brudaru

We compared dlt and Sling for data ingestion performance, cost, and flexibility. See how they stack up and which might suit your data needs best.

Adrian Brudaru +2

Ajay Moorjani turned a deceptively simple JSON to Snowflake task into a rock solid pipeline using dlt, dbt, and Airflow, built in less than a coffee break.

Aman Gupta

Leveraging AI to build a dlt extract and load of coldplay data from spotify and visualize it in Visivo.

Jared Jesionek