Skip to main content
Version: devel View Markdown

Introduction

info

Looking for the open-source dlt library documentation? See the dlt docs.

note

Use of the dltHub platform and toolkits is subject to a commercial dltHub License.

What is dltHub?

dltHub is an agent-native data engineering platform for building, running, and operating production-grade data pipelines. The toolchain is designed to be driven from coding agents — Claude Code, Codex, and Cursor — through scaffolding commands and per-source context files. A developer or analyst comfortable with Python and a coding agent can build and operate ingestion, transformations, quality checks, and data apps end-to-end without managing infrastructure.

Context — source schemas, annotations, transformation logic, and run metadata — propagates from the data source through transformations to the serving layer. Downstream tools, dashboards, and agents can reason about upstream intent without re-discovering it.

dltHub is built around the open-source library dlt. It reuses the same core concepts (sources, destinations, pipelines) and extends the extract-and-load focus of dlt with:

dltHub supports both local and managed cloud development. From a dltHub Workspace, with isolated profiles for dev, prod, and access environments, a single developer can deploy and operate pipelines, transformations, and notebooks with a single command. The platform, workspace dashboard, and validation tools provide monitoring, troubleshooting, and reliability across the full data workflow:

On dltHub, users can:

  • Build and customize data pipelines quickly, optionally delegating boilerplate to a coding agent
  • Maintain data quality through declarative checks, tests, and alerts
  • Deliver up-to-date dashboards, reports, and data apps
  • Scale data workflows without manually managing infrastructure, schema drift, or silent failures
tip

For an end-to-end walkthrough, watch the dltHub demo, take the dltHub agentic data engineering course, or sign in to the dltHub platform to deploy a workspace.

To get started quickly, follow the installation instructions.

Design principles

dltHub is designed around three principles:

  • Transparent and context-aware. Pipelines, sources, and transformations are plain Python you can inspect, customize, and extend — no black-box abstractions. Schemas, annotations, run metadata, and traces propagate from the data source through transformations to the serving layer, so both developers and agents can reason about upstream intent and downstream impact without re-deriving it from prompts
  • Modular and composable. Sources, destinations, transformations, and platform components are independent building blocks. Adopt only the parts you need and integrate the rest with the surrounding ecosystem (dbt, Ibis, marimo, Streamlit, your own destinations)
  • Agent guardrails with humans in the loop. Agent-driven workflows include explicit checkpoints — sample runs, generated-code inspection, redacted-secrets commands — so AI-assisted development stays observable and reviewable. Deterministic tooling is used wherever probabilistic behavior is not reliable enough (for example, secrets handling)

Capabilities

dltHub covers the end-to-end data workflow. Features marked in public preview are broadly available with mature documentation and intended for real workloads, but are not yet fully hardened — expect occasional minor breaking changes. For upcoming features see the dltHub roadmap.

Ingestion pipeline development

Build extract-and-load pipelines from REST APIs, SQL databases, cloud storage, and Python data structures, with schema inference, normalization, and incremental loading provided by the underlying dlt library.

Transformation pipeline development

Write transformations alongside your ingestion pipelines so they share datasets, schemas, and deployment. Source context — annotations, types, and lineage — carries into transformations and on to the serving layer.

  • @dlt.hub.transformation (in public preview) — Python-decorated transformations that run as part of your pipeline graph
  • AI workbench transformation toolkit (in public preview) — generate and refactor Python and SQL transformations from prompts driven by business ontologies
  • dbt integration — run dbt projects with a local cache, schema enforcement, and integrated debugging

Pipeline operations

Deploy, schedule, and monitor pipelines, transformations, and notebooks without standing up infrastructure.

Data quality & governance

Catch data issues before they reach consumers and keep schemas controlled as sources change.

  • Data quality checks (in public preview) — declarative correctness rules with actionable failure messages
  • Advanced quality features (in public preview) — author and run tests against your datasets as part of a pipeline

Data discovery & serving

Make loaded data accessible to stakeholders through notebooks, dashboards, and shareable links. Source schemas and transformation context are available here so agents and consumers see the same upstream metadata that drove ingestion.

  • Datasets — typed Python and SQL access to loaded data
  • Marimo notebooks — build lightweight, shareable data apps
  • Public links for interactive jobs — share notebooks and dashboards externally without granting platform access

Platform capabilities

Foundations that the rest of the platform builds on.

Pricing and licensing

For current plan details and pricing, see the dltHub pricing page. Use of the dltHub platform and toolkits is governed by the dltHub License.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.