Introduction
Looking for the open-source dlt library documentation? See the dlt docs.
Use of the dltHub platform and toolkits is subject to a commercial dltHub License.
What is dltHub?
dltHub is an agent-native data engineering platform for building, running, and operating production-grade data pipelines. The toolchain is designed to be driven from coding agents — Claude Code, Codex, and Cursor — through scaffolding commands and per-source context files. A developer or analyst comfortable with Python and a coding agent can build and operate ingestion, transformations, quality checks, and data apps end-to-end without managing infrastructure.
Context — source schemas, annotations, transformation logic, and run metadata — propagates from the data source through transformations to the serving layer. Downstream tools, dashboards, and agents can reason about upstream intent without re-discovering it.
dltHub is built around the open-source library dlt. It reuses the same core concepts (sources, destinations, pipelines) and extends the extract-and-load focus of dlt with:
- Agent-native developer experience
- Transformations
- Data quality
- Managed infrastructure for pipelines and data apps
- Observability for pipelines and data apps
dltHub supports both local and managed cloud development. From a dltHub Workspace, with isolated profiles for dev, prod, and access environments, a single developer can deploy and operate pipelines, transformations, and notebooks with a single command. The platform, workspace dashboard, and validation tools provide monitoring, troubleshooting, and reliability across the full data workflow:
On dltHub, users can:
- Build and customize data pipelines quickly, optionally delegating boilerplate to a coding agent
- Maintain data quality through declarative checks, tests, and alerts
- Deliver up-to-date dashboards, reports, and data apps
- Scale data workflows without manually managing infrastructure, schema drift, or silent failures
For an end-to-end walkthrough, watch the dltHub demo, take the dltHub agentic data engineering course, or sign in to the dltHub platform to deploy a workspace.
To get started quickly, follow the installation instructions.
Design principles
dltHub is designed around three principles:
- Transparent and context-aware. Pipelines, sources, and transformations are plain Python you can inspect, customize, and extend — no black-box abstractions. Schemas, annotations, run metadata, and traces propagate from the data source through transformations to the serving layer, so both developers and agents can reason about upstream intent and downstream impact without re-deriving it from prompts
- Modular and composable. Sources, destinations, transformations, and platform components are independent building blocks. Adopt only the parts you need and integrate the rest with the surrounding ecosystem (dbt, Ibis, marimo, Streamlit, your own destinations)
- Agent guardrails with humans in the loop. Agent-driven workflows include explicit checkpoints — sample runs, generated-code inspection, redacted-secrets commands — so AI-assisted development stays observable and reviewable. Deterministic tooling is used wherever probabilistic behavior is not reliable enough (for example, secrets handling)
Capabilities
dltHub covers the end-to-end data workflow. Features marked in public preview are broadly available with mature documentation and intended for real workloads, but are not yet fully hardened — expect occasional minor breaking changes. For upcoming features see the dltHub roadmap.
Ingestion pipeline development
Build extract-and-load pipelines from REST APIs, SQL databases, cloud storage, and Python data structures, with schema inference, normalization, and incremental loading provided by the underlying dlt library.
- Workspace scaffolding — initialize a project structure that fits how
dltpipelines are developed and deployed - AI workbench (agent-native workflow) — generate REST API, SQL database, and filesystem pipelines from prompts using ingestion development toolkits
- Premium destinations — load to Iceberg lakehouses, Delta Lake, Snowflake Plus, or MS SQL with change tracking
Transformation pipeline development
Write transformations alongside your ingestion pipelines so they share datasets, schemas, and deployment. Source context — annotations, types, and lineage — carries into transformations and on to the serving layer.
@dlt.hub.transformation(in public preview) — Python-decorated transformations that run as part of your pipeline graph- AI workbench transformation toolkit (in public preview) — generate and refactor Python and SQL transformations from prompts driven by business ontologies
- dbt integration — run dbt projects with a local cache, schema enforcement, and integrated debugging
Pipeline operations
Deploy, schedule, and monitor pipelines, transformations, and notebooks without standing up infrastructure.
- dltHub platform—one-command deploy of an entire workspace, with cron and event-driven triggers, follow-up chains, freshness checks, and refresh cascades. Sign in at app.dlthub.com
- Profiles and regions—isolate
dev,prod, andaccessconfigurations and credentials, and choose where your data plane runs - Workspace dashboard & monitoring—observe runs, schemas, and lineage from a single UI; stream logs and diagnose failures from the CLI or Web UI
Data quality & governance
Catch data issues before they reach consumers and keep schemas controlled as sources change.
- Data quality checks (in public preview) — declarative correctness rules with actionable failure messages
- Advanced quality features (in public preview) — author and run tests against your datasets as part of a pipeline
Data discovery & serving
Make loaded data accessible to stakeholders through notebooks, dashboards, and shareable links. Source schemas and transformation context are available here so agents and consumers see the same upstream metadata that drove ingestion.
- Datasets — typed Python and SQL access to loaded data
- Marimo notebooks — build lightweight, shareable data apps
- Public links for interactive jobs — share notebooks and dashboards externally without granting platform access
Platform capabilities
Foundations that the rest of the platform builds on.
- GitHub OAuth, Google OAuth, email signup, and API key authentication, with organization and workspace roles
- Managed, multi-tenant runtime with upgrades and patching handled for you
- Secure secrets management per profile
Pricing and licensing
For current plan details and pricing, see the dltHub pricing page. Use of the dltHub platform and toolkits is governed by the dltHub License.