Blog

May 12, 2026·Product,Tutorials,Engineering

From prompt to production: a free course on agentic data engineering

With 91% of dlt pipelines AI-written, learn Agentic Data Engineering in this free 1-hour course.

Adrian Brudaru

Read

May 5, 2026·Product,Engineering,Industry

Ontology engineering: what it is, why it's back, and why agents need it

Agents don't hallucinate. They navigate without a map. Ontology engineering is how you build one, and why every team pulling humans out of the loop needs it now.

Adrian Brudaru

Read

April 24, 2026·Product,Community,Tutorials

I tracked the Iran-USA conflict, oil prices, and Bitcoin — without a data team

The dltHub AI Workbench gives Claude Code a structured workflow for building data pipelines. We put it to the test with a real geopolitical question.

Roshni Melwani

Read

April 24, 2026·Tutorials

Operational Health: Schema update detection with dlt

dlt handles schema evolution efficiently but silently. Here's how to read dlt's metadata and be informed of what's shifting in your pipeline.

Aman Gupta

Read

April 23, 2026·Tutorials,Product

Operational Health: Auditing data freshness with dlt metadata

A "Success" exit code only tells you the pipeline ran. Use `load_id` to join `_dlt_loads` with your source table and check if the data is actually fresh.

Aman Gupta

Read

April 21, 2026·Community,Product

Who maintains the skill layer?

We're in an LLM-coding junior bubble. "It runs" isn't the senior bar. Lifecycle rigor and dependency management are.

Adrian Brudaru

Read

April 14, 2026·Product

Agentic toolkit eval: dltHub REST API toolkit

The dlt AI Workbench transforms AI-generated "vibe coding" from an unmanaged process full of hidden risks into a mature engineering workflow that prioritizes security, current documentation, and persistent state by default.

Adrian Brudaru

Read

April 14, 2026·Product,Community

Agentic guardrails: The next layer in agentic engineering

Part of the [dltHub AI Workbench series](https://dlthub.com/blog/ai-workbench)

Adrian Brudaru

Read

April 14, 2026·Tutorials

dltHub Pro + Cortex Code: Two agents, better together

TL;DR: Cortex Code helps you work with data already in Snowflake. dltHub Pro gets data into Snowflake from any source, especially the ones no ETL tool covers. They operate at different layers of the stack and they are designed to hand off to each other.

Adrian Brudaru

Read

April 8, 2026·Product

What's the Minimum Viable Context for Building a Canonical Data Model with an LLM?

Call it the MVC problem: minimum viable context. Too little and it hallucinates your domain. Too much and it drifts from your actual goal. The process has to be controlled.

Hiba Jamal

Read

March 24, 2026·Product

dltHub AI Workbench: Ontology driven data modelling toolkit preview

How are LLMs supposed to know the business logic of how you use Hubspot, Luma and Slack together? How are they supposed to know what a customer means to you?

Hiba Jamal

Read

March 23, 2026·Product

Agents can write dlt pipelines. Now they can run & deploy them.

Today we are introducing the dltHub AI Workbench: an infrastructure layer for dltHub that makes AI-generated dlt pipelines trustworthy enough to run and deploy in production.

Matthaus Krzykowski

Read

March 17, 2026·Tutorials

How to protect PII with dlt and Pydantic

Stop PII leaks before they hit your warehouse. By using dlt and Pydantic to enforce data contracts, you can sanitize or quarantine sensitive fields the moment they’re ingested.

Aman Gupta

Read

March 10, 2026·Product

So you vibe coded a data stack, now what?

In this blog post, I will describe the actual, hard real world barriers that make your LLM setup collapse, and propose principles for making your systems work.

Adrian Brudaru

Read

March 10, 2026·Product,Tutorials,Community

Building production-ready data pipelines in Microsoft Fabric: A complete data quality framework with dlthub

Add data quality gates to Microsoft Fabric with dlt. Validate schemas, catch bad records, and mask PII before data reaches your lakehouse and downstream analytics.

Rakesh Gupta

Read

March 9, 2026·Tutorials,Engineering

Your Traces Aren't Training Data Yet. Here's the Pipeline That Makes Them.

Production traces are scattered across databases, log aggregators, and storage buckets, and most of them aren't clean (input, output) pairs you can hand to a training job. This walkthrough shows how to build a dlt pipeline that extracts traces from any source, transforms them into structured conversation formats, and lands them as versioned Parquet on Hugging Face, ready for Distil Labs to generate synthetic training data and deliver a specialist model that beats the LLM you're running today.

Alena Astrakhantseva +1

Read

March 9, 2026·Product

Hugging Face x dltHub: The missing data layer for ML practitioners

From raw data to production ML: load, transform, embed, and publish curated datasets with declarative pipelines powered by dltHub.

Elvis Kahoro +2

Read

March 3, 2026·Engineering

Testing Before Loading: WAP and AWAP

Single-gate validation fails to decouple row-level syntax from batch-level semantics. Evolve from WAP to the AWAP protocol with this simple dlt tutorial to stop pipeline corruption at the source.

Roshni Melwani

Read

February 25, 2026·Product

Ontology driven Dimensional Modeling

Trying to force an LLM to reconstruct the 'world' using only a semantic layer is like trying to turn cheese back into milk. The information required to understand the original system was stripped away during the modeling process.

Adrian Brudaru

Read

February 19, 2026·Community

Memory for AI Agents: Understanding Modeling for Unstructured Data

For the more classic data engineering crowd, here’s an explainer of how unstructured AI memory works, though the lens of what we know from working with structured data.

Adrian Brudaru

Read

February 17, 2026·Product

Debugging Our Docs RAG, Part 2: Testing New Generation Models

By upgrading only the generative model, we achieved a 3x accuracy boost but hit a hard ceiling, proving that not only LLMs are needed for good retrieval.

Aashish Nair

Read

February 13, 2026·Tutorials

LLM-native EL workshop on Data Talks Club

Remus Molnar

Read

February 10, 2026·Product,Engineering

The Last Mile is Solved by Slop

I didn't vibe-build a product. I wrote a messy scaffold that runs a pipeline, grabs the schema, and forces an agent to build a star schema. It works shockingly well.

Adrian Brudaru

Read

February 4, 2026·Community

Finding the UFC GOAT: A Full Stack Pipeline with dlt, dbt and Metabase

Analyzing UFC greatness by building a full stack (dlt, dbt, Metabase) to transform raw fight stats into a data-driven search for the true GOAT.

Reshef Sharvit

Read

February 3, 2026·Product,Engineering,Updates

3.7x Faster EL Pipelines: Arrow + ADBC vs. SQLAlchemy

Moved 5M rows from DuckDB to MySQL 3.7x faster, reducing time from 344s to 92s by switching from SQLAlchemy’s row-by-row path to Arrow + ADBC’s columnar pipeline.

Aman Gupta

Read

January 28, 2026·Community,Industry,Product

The Builder: Outliving the Modern Data Stack

We were told that democratization meant 'safety,' but all we got were expensive cages. The era of the SaaS hostage is ending; the era of the sovereign Builder has begun.

Adrian Brudaru

Read

January 21, 2026·Industry,Product

The Plutonium Protocol: Engineering Safety for the LLM Intern Era

The “data is oil” era is over. With LLMs, data is plutonium: powerful, toxic. Shift left and secure the reactor with 5 quality pillars.

Adrian Brudaru

Read

January 13, 2026·Tutorials,Engineering

Debugging Our Docs RAG, Part 1: Evaluating a Production RAG System

Our docs RAG was failing quietly. We stopped guessing and built a real-user evaluation: the first baseline we could actually measure and improve.

Aashish Nair

Read

January 7, 2026·Industry,Product

Autofilling the Boring Semantic Layer: From Sakila to Chat-BI with dltHub

Adrian Brudaru

Read

December 22, 2025·Tutorials,Industry

11 Pythonic Data Quality Recipes for every day

11 practical, copy-paste data quality recipes for dlt. From schema freezes to alerts, learn how to keep pipelines clean, safe, and production-ready

Aman Gupta

Read

December 16, 2025·Community,Product

DuckLake to MotherDuck: Validate locally, deploy to cloud in minutes

Start local with DuckLake, validate your data, then deploy to MotherDuck in minutes. Same pipeline, same code, just switch the destination.

Aman Gupta

Read

December 8, 2025·Industry,Engineering

Data contract agreement vs enforcement

Data contracts keep systems predictable by pairing clear rules with checks that catch bad data before it flows downstream.

Adrian Brudaru

Read

November 20, 2025·Industry,Engineering

Convergence: The Anti-Entropy Engine

Most LLM runs don’t fail. They converge fast, and the secret isn’t smarter models but better scaffolds that guide the work instead of forcing it.

Adrian Brudaru

Read

October 20, 2025·Industry,Engineering

Openflow vs. dlt for Snowflake users

Openflow and dltHub represent two distinct but valuable visions for the future of data ingestion.

Adrian Brudaru

Read

October 14, 2025·Product

Surviving the AI code Deluge: Data quality in the Spotlight

This is, we’re told, the great democratization of data engineering. The tedious work is gone. The barrier to entry is gone. Everyone can now be a data engineer.

Adrian Brudaru

Read

September 24, 2025·Engineering,Industry

Motherduck Europe & dlt DuckLake support

MotherDuck lands in Europe with serverless DuckDB warehousing. dlt adds DuckLake support, giving EU teams a fast, modern data stack.

Adrian Brudaru

Read

September 22, 2025·Engineering

SAP Data Ingestion with Python: A Technical Breakdown of Using the SAP RFC Protocol

SAP data is hard to extract. Dominik’s new Python connector replaces pyRFC, enabling faster, chunked ingestion into modern pipelines.

Mateusz Paździor

Read

September 19, 2025·Engineering

"Scaled Mediocrity": The counterintuitive AI Strategy that's delivering ROI

LLM leaders agree: the true win is "scaled mediocrity." We're empowering teams with good enough tools for massive, real-world impact.

Adrian Brudaru

Read

September 10, 2025·Updates

Supercharge your data loading: Go beyond `pandas.to_sql()`

For quick tasks, df.to_sql() is perfect. But for production pipelines, it quickly shows its limits when data volume, frequency, and schema change.

Adrian Brudaru

Read

September 9, 2025·Product,Tutorials

SCD2 Deep Dive with dlt: How nested data affects queries and costs

Learn how dlt automates SCD2 for nested JSON data without complex SQL headaches. Real BigQuery benchmarks show incremental loading cuts costs by 25-35%.

Aman Gupta

Read

August 20, 2025·Community,Engineering

Emmanuel's production-ready Kafka framework: extending dlt the right way

Emmanuel built a slim framework on top of dlt that levels up the vanilla Kafka source into a production-ready setup. Check it out 🚀

Aman Gupta

Read

August 6, 2025·Product

Build → Deploy → Share: A roadmap for sharing on dltHub

You want connectors, and you want them to be many, high quality and customisable? A man can dream? here’s our roadmap to making those dreams a reality, and how you can help us today.

Adrian Brudaru

Read

August 5, 2025·Industry,Product

Sling vs dlt SQL connector Benchmark: Spend 3x Less, Load Faster with dlt

We compared dlt and Sling for data ingestion performance, cost, and flexibility. See how they stack up and which might suit your data needs best.

Adrian Brudaru +2

Read

July 31, 2025·Community,Engineering

Why a simple task speaks volumes

Ajay Moorjani turned a deceptively simple JSON to Snowflake task into a rock solid pipeline using dlt, dbt, and Airflow, built in less than a coffee break.

Aman Gupta

Read

July 28, 2025·Community,Industry

Leveraging Claude Code to Build a dlt & Visivo Project

Leveraging AI to build a dlt extract and load of coldplay data from spotify and visualize it in Visivo.

Jared Jesionek

Read

July 24, 2025·Tutorials,Community

Michael, dlt, and the art of unbreakable API pipelines

Built another pipeline just to keep a dashboard alive? Then it broke again? Michael Shoemaker shows how dlt makes API pipelines fix themselves, no drama.

Adrian Brudaru

Read

July 16, 2025·Updates

We’re building dltHub to make data engineering accessible for all Python developers

We’re excited to announce that we’re building dltHub, an LLM-native data engineering platform that enables any Python developer to build, run dlt pipelines, and deliver valuable end-user-ready reports.

Matthaus Krzykowski

Read

July 15, 2025·Product

A Practitioner’s Guide to LLM-native Pipeline Building with dltHub Workspace

LLM-native scaffolds for 1000+ APIs. The IKEA moment in data engineering is here. Build pipelines with LLMs, faster and cleaner.

Adrian Brudaru

Read

July 7, 2025·Tutorials

Turn your Documentation into a Queryable Knowledge Graph for High retrieval accuracy and low hallucinations

Using dlt + Cognee, we take API docs from Slack, PayPal, and TicketMaster and built a knowledge graph.

Hiba Jamal

Read

June 29, 2025·Tutorials,Community

How I went from “I’ll never build a pipeline” to doing it in an hour with Cursor

Dev takes Alena’s dlt course, then uses AI to build a WHOOP sleep-data pipeline, saving the data to Parquet, demonstrating that beginners can master pipelines quickly.

Roshni Melwani

Read

June 25, 2025·Product,Industry

We've been using LanceDB to make AI development smoother

We've been using LanceDB for months at dltHub to build AI systems more quickly. The same setup works locally and in the cloud. Handles structured and vector data in one place.

Adrian Brudaru

Read

June 18, 2025·Industry,Product

Building Engine-Agnostic Data Stacks

Mixing Spark, DuckDB, and Snowflake? Iceberg decouples data, Ibis decouples logic, run your analytics anywhere, without rewrites or vendor lock-in.

Adrian Brudaru

Read

June 16, 2025·Industry,Product

Iceberg-First Ingestion: How Taktile cut 70% of costs

Taktile cut 70% of data loading costs by shifting ingestion to Iceberg via Lambda + dlt, keeping Snowflake for analytics. Smart layers, big savings.

Adrian Brudaru

Read

June 3, 2025·Industry,Product

From Singer to simplicity: Why Data Teams choose dlt.

Singer was Stitch's incomplete competitive response to Fivetran. Meltano completed what Stitch never intended to fully open source. dlt learned from both and built the fitting abstraction for pythonic data teams.

Adrian Brudaru

Read

May 28, 2025·Industry,Product

Fivetran vs dlt: Quickstart vs Endgame

A side-by-side look at Fivetran and dlt, covering cost models, customization, and how each approach affects team workflows as your data needs evolve.

Adrian Brudaru

Read

May 27, 2025·Tutorials

The REST API Integration costs: How AI + dlt is finally making it bearable

REST API integrations come with hidden costs, pagination, schema drift, rate limits. With dlt + Cursor, you skip the boilerplate and build pipelines in minutes, not days. Less code. Less chaos. More time to build.

Aman Gupta

Read

May 14, 2025·Community

Materializing Multi-Asset REST API Sources with dlt, Dagster, and DuckDB

A hands-on guide to combining dlt and Dagster for orchestrating multi-endpoint API ingestion pipelines, with assets materialized into DuckDB. Three patterns. One powerful workflow. Plus, a peek at the new CLI and DuckDB UI.

Jairus Martinez

Read

May 13, 2025·Product

Breaking free from SQL: A Normie's guide to portable data pipelines

Data engineering shouldn't require rewriting the same logic multiple times for different environments. dlt's dataset interface gives you one consistent way to work with your data, regardless of where it lives.

Adrian Brudaru

Read

May 5, 2025·Updates

What’s new in dlt for Databricks: built-in staging, zero-config notebooks, no headaches

Ingesting to Databricks should be simple. With dlt, it finally is. No config files, no staging, just Python and go.

Aman Gupta

Read

April 29, 2025·Product

Vibe Coding: Why Building Data Pipelines with LLMs Actually Works

Vibe coding so clean, it will make your old code look bad.

Adrian Brudaru

Read

April 28, 2025·Community

Julian Alves and dlt: when expertise meets simplicity

Julian Alves builds reliable, simple data infrastructure. He partners with dlt to help companies create systems that deliver value, not burden.

Adrian Brudaru

Read

April 22, 2025·Updates

Celebrating our 3,000th OSS dlt customer as dlt’s momentum accelerates

dlt has grown from 1,000 to over 3,000 open-source users in just six months, with monthly downloads surpassing 1.4 million. This momentum reflects a growing demand for Python-native, modular, and AI-ready data tools — and dlt is building exactly that.

Matthaus Krzykowski

Read

April 22, 2025·Updates

What’s next for dlt in 2025: a simpler solution for solving complex problems

dlt started as a tool for handling JSON documents. It was meant for the average Python user that does not want to deal with creating and evolving schemas, SQL models, backends and data engineers that control them.

Marcin Rudolf

Read

April 15, 2025·Engineering

The future's Re-Composable: Converting Connectors Between Solutions with LLMs

Let's stop reinventing connectors in isolation. Use LLMs to transform scattered integrations into shared, reusable solutions.

Adrian Brudaru

Read

April 14, 2025·Community

Fabric + dlt, Course and Explorations

As Rakesh was exploring Fabric, dlt kept showing up in Rakesh's stack. Not by design, but because it just worked. Different projects, same ingestion layer, quietly doing its job.

Adrian Brudaru

Read

April 3, 2025·Tutorials

AI built the Pipeline, I plugged the leaks

I tried Vibe-coding a Singer tap (Pipedrive) into dlt and it worked, but it needed some user intervention.

Adrian Brudaru

Read

April 2, 2025·Community

How to run dlt with Airflow (Or any other Python thing)

Explore four ways to run dlt with Apache Airflow, from PythonOperators to KubernetesPods, and learn which setup scales best for clean, reliable pipelines.

Francesco Mucio

Read

March 31, 2025·Tutorials

Towards a Benchmark for AI-Generated Data Pipelines

Building pipelines with AI isn’t one task, it's many. In this post, we explore how to split and test them individually, so failures are easier to diagnose and fix.

Adrian Brudaru

Read

March 29, 2025·Engineering

Are you moving the right data? Write. Audit. Publish. (WAP)

The Write. Audit. Publish. (WAP) framework brings discipline from software engineering: write in isolation, audit for correctness, quality, and compliance, publish with confidence. But can data engineering really follow suit? Let's discuss.

Aman Gupta

Read

March 26, 2025·Tutorials

Erase tech debt from data loading with dlt + Cursor + LLMs

Modernisation at its finest, from trash to cutting edge in seconds. It works amazing, just give it a try, stop paying for tech debt

Adrian Brudaru

Read

March 25, 2025·Tutorials

From Airbyte YAML to Scalable Python Pipelines with dlt (dltHub), Cursor and LLMs

In this microblog + video we explore generating python pipelines (dlt REST API) from Airbyte low code yaml spec. tl;dr: it works well.

Adrian Brudaru

Read

Blog

Exploring schema evolution with ontology-driven propagation

From prompt to production: a free course on agentic data engineering

Ontology engineering: what it is, why it's back, and why agents need it

I tracked the Iran-USA conflict, oil prices, and Bitcoin — without a data team

Operational Health: Schema update detection with dlt

Operational Health: Auditing data freshness with dlt metadata

Who maintains the skill layer?

Agentic toolkit eval: dltHub REST API toolkit

Agentic guardrails: The next layer in agentic engineering

dltHub Pro + Cortex Code: Two agents, better together

What's the Minimum Viable Context for Building a Canonical Data Model with an LLM?

dltHub AI Workbench: Ontology driven data modelling toolkit preview

Agents can write dlt pipelines. Now they can run & deploy them.

How to protect PII with dlt and Pydantic

So you vibe coded a data stack, now what?

Building production-ready data pipelines in Microsoft Fabric: A complete data quality framework with dlthub

Your Traces Aren't Training Data Yet. Here's the Pipeline That Makes Them.

Hugging Face x dltHub: The missing data layer for ML practitioners

Testing Before Loading: WAP and AWAP

Ontology driven Dimensional Modeling

Memory for AI Agents: Understanding Modeling for Unstructured Data

Debugging Our Docs RAG, Part 2: Testing New Generation Models

LLM-native EL workshop on Data Talks Club

The Last Mile is Solved by Slop

Finding the UFC GOAT: A Full Stack Pipeline with dlt, dbt and Metabase

3.7x Faster EL Pipelines: Arrow + ADBC vs. SQLAlchemy

The Builder: Outliving the Modern Data Stack

The Plutonium Protocol: Engineering Safety for the LLM Intern Era

Debugging Our Docs RAG, Part 1: Evaluating a Production RAG System

Autofilling the Boring Semantic Layer: From Sakila to Chat-BI with dltHub

11 Pythonic Data Quality Recipes for every day

DuckLake to MotherDuck: Validate locally, deploy to cloud in minutes

Data contract agreement vs enforcement

Convergence: The Anti-Entropy Engine

Openflow vs. dlt for Snowflake users

Surviving the AI code Deluge: Data quality in the Spotlight

Motherduck Europe & dlt DuckLake support

SAP Data Ingestion with Python: A Technical Breakdown of Using the SAP RFC Protocol

"Scaled Mediocrity": The counterintuitive AI Strategy that's delivering ROI

Supercharge your data loading: Go beyond `pandas.to_sql()`

SCD2 Deep Dive with dlt: How nested data affects queries and costs

Emmanuel's production-ready Kafka framework: extending dlt the right way

Build → Deploy → Share: A roadmap for sharing on dltHub

Sling vs dlt SQL connector Benchmark: Spend 3x Less, Load Faster with dlt

Why a simple task speaks volumes

Leveraging Claude Code to Build a dlt & Visivo Project

Michael, dlt, and the art of unbreakable API pipelines

We’re building dltHub to make data engineering accessible for all Python developers

A Practitioner’s Guide to LLM-native Pipeline Building with dltHub Workspace

Turn your Documentation into a Queryable Knowledge Graph for High retrieval accuracy and low hallucinations

How I went from “I’ll never build a pipeline” to doing it in an hour with Cursor

We've been using LanceDB to make AI development smoother

Building Engine-Agnostic Data Stacks

Iceberg-First Ingestion: How Taktile cut 70% of costs

From Singer to simplicity: Why Data Teams choose dlt.

Fivetran vs dlt: Quickstart vs Endgame

The REST API Integration costs: How AI + dlt is finally making it bearable

Materializing Multi-Asset REST API Sources with dlt, Dagster, and DuckDB

Breaking free from SQL: A Normie's guide to portable data pipelines

What’s new in dlt for Databricks: built-in staging, zero-config notebooks, no headaches

Vibe Coding: Why Building Data Pipelines with LLMs Actually Works

Julian Alves and dlt: when expertise meets simplicity

Celebrating our 3,000th OSS dlt customer as dlt’s momentum accelerates

What’s next for dlt in 2025: a simpler solution for solving complex problems

The future's Re-Composable: Converting Connectors Between Solutions with LLMs

Fabric + dlt, Course and Explorations

AI built the Pipeline, I plugged the leaks

How to run dlt with Airflow (Or any other Python thing)

Towards a Benchmark for AI-Generated Data Pipelines

Are you moving the right data? Write. Audit. Publish. (WAP)

Erase tech debt from data loading with dlt + Cursor + LLMs

From Airbyte YAML to Scalable Python Pipelines with dlt (dltHub), Cursor and LLMs