Introduction

info

Looking for the open-source dlt library documentation? See the dlt docs.

note

Use of the dltHub platform and toolkits is subject to a commercial dltHub License.

What is dltHub?

dltHub is an agent-native data engineering platform for building, running, and operating production-grade data pipelines. The toolchain is designed to be driven from coding agents — Claude Code, Codex, and Cursor — through scaffolding commands and per-source context files. A developer or analyst comfortable with Python and a coding agent can build and operate ingestion, transformations, quality checks, and data apps end-to-end without managing infrastructure.

Context — source schemas, annotations, transformation logic, and run metadata — propagates from the data source through transformations to the serving layer. Downstream tools, dashboards, and agents can reason about upstream intent without re-discovering it.

dltHub is built around the open-source library dlt. It reuses the same core concepts (sources, destinations, pipelines) and extends the extract-and-load focus of dlt with:

Agent-native developer experience
Transformations
Data quality
Managed infrastructure for pipelines and data apps
Observability for pipelines and data apps

dltHub supports both local and managed cloud development. From a dltHub Workspace, with isolated profiles for dev, prod, and access environments, a single developer can deploy and operate pipelines, transformations, and notebooks with a single command. The platform, workspace dashboard, and validation tools provide monitoring, troubleshooting, and reliability across the full data workflow:

On dltHub, users can:

Build and customize data pipelines quickly, optionally delegating boilerplate to a coding agent
Maintain data quality through declarative checks, tests, and alerts
Deliver up-to-date dashboards, reports, and data apps
Scale data workflows without manually managing infrastructure, schema drift, or silent failures

tip

For an end-to-end walkthrough, watch the dltHub demo, take the dltHub agentic data engineering course, or sign in to the dltHub platform to deploy a workspace.

To get started quickly, follow the installation instructions.

Design principles

dltHub is designed around three principles:

Transparent and context-aware. Pipelines, sources, and transformations are plain Python you can inspect, customize, and extend — no black-box abstractions. Schemas, annotations, run metadata, and traces propagate from the data source through transformations to the serving layer, so both developers and agents can reason about upstream intent and downstream impact without re-deriving it from prompts
Modular and composable. Sources, destinations, transformations, and platform components are independent building blocks. Adopt only the parts you need and integrate the rest with the surrounding ecosystem (dbt, Ibis, marimo, Streamlit, your own destinations)
Agent guardrails with humans in the loop. Agent-driven workflows include explicit checkpoints — sample runs, generated-code inspection, redacted-secrets commands — so AI-assisted development stays observable and reviewable. Deterministic tooling is used wherever probabilistic behavior is not reliable enough (for example, secrets handling)

Capabilities

dltHub covers the end-to-end data workflow. Features marked in public preview are broadly available with mature documentation and intended for real workloads, but are not yet fully hardened — expect occasional minor breaking changes. For upcoming features see the dltHub roadmap.

Ingestion pipeline development

Build extract-and-load pipelines from REST APIs, SQL databases, cloud storage, and Python data structures, with schema inference, normalization, and incremental loading provided by the underlying dlt library.

Workspace scaffolding — initialize a project structure that fits how dlt pipelines are developed and deployed
AI workbench (agent-native workflow) — generate REST API, SQL database, and filesystem pipelines from prompts using ingestion development toolkits
Premium destinations — load to Iceberg lakehouses, Delta Lake, Snowflake Plus, or MS SQL with change tracking

Transformation pipeline development

Write transformations alongside your ingestion pipelines so they share datasets, schemas, and deployment. Source context — annotations, types, and lineage — carries into transformations and on to the serving layer.

@dlt.hub.transformation (in public preview) — Python-decorated transformations that run as part of your pipeline graph
AI workbench transformation toolkit (in public preview) — generate and refactor Python and SQL transformations from prompts driven by business ontologies
dbt integration — run dbt projects with a local cache, schema enforcement, and integrated debugging

Pipeline operations

Deploy, schedule, and monitor pipelines, transformations, and notebooks without standing up infrastructure.

dltHub platform—one-command deploy of an entire workspace, with cron and event-driven triggers, follow-up chains, freshness checks, and refresh cascades. Sign in at app.dlthub.com
Profiles and regions—isolate dev, prod, and access configurations and credentials, and choose where your data plane runs
Workspace dashboard & monitoring—observe runs, schemas, and lineage from a single UI; stream logs and diagnose failures from the CLI or Web UI

Data quality & governance

Catch data issues before they reach consumers and keep schemas controlled as sources change.

Data quality checks (in public preview) — declarative correctness rules with actionable failure messages
Advanced quality features (in public preview) — author and run tests against your datasets as part of a pipeline

Data discovery & serving

Make loaded data accessible to stakeholders through notebooks, dashboards, and shareable links. Source schemas and transformation context are available here so agents and consumers see the same upstream metadata that drove ingestion.

Datasets — typed Python and SQL access to loaded data
Marimo notebooks — build lightweight, shareable data apps
Public links for interactive jobs — share notebooks and dashboards externally without granting platform access

Platform capabilities

Foundations that the rest of the platform builds on.

GitHub OAuth, Google OAuth, email signup, and API key authentication, with organization and workspace roles
Managed, multi-tenant runtime with upgrades and patching handled for you
Secure secrets management per profile

Pricing and licensing

For current plan details and pricing, see the dltHub pricing page. Use of the dltHub platform and toolkits is governed by the dltHub License.

Introduction

What is dltHub?

Design principles

Capabilities

Ingestion pipeline development

Transformation pipeline development

Pipeline operations

Data quality & governance

Data discovery & serving

Platform capabilities

Pricing and licensing

DHelp

Ask a question

What is dltHub?​

Design principles​

Capabilities​

Ingestion pipeline development​

Transformation pipeline development​

Pipeline operations​

Data quality & governance​

Data discovery & serving​

Platform capabilities​

Pricing and licensing​

DHelp

Ask a question

What is dltHub?

Design principles

Capabilities

Ingestion pipeline development

Transformation pipeline development

Pipeline operations

Data quality & governance

Data discovery & serving

Platform capabilities

Pricing and licensing