Skip to main content
Version: 1.27.2 (latest) View Markdown

Transformations with dltHub

dltHub extends dlt with @dlt.hub.transformation — transformations defined in Python that run eagerly on your compute (e.g. local DuckDB, Pandas, Polars, or Arrow) or lazily on the warehouse — the same code, executed where it makes the most sense. A transformation is a resource that takes a dlt.Dataset and yields an Ibis expression, a Relation, or a plain SQL query; dlt compiles it to the destination SQL dialect and materializes the result into a table. When source and target share the same physical location, no data is transferred to the machine running the pipeline.

The example below incrementally appends new rows from the orders table to recent_orders — each run picks up after the last processed created_at:

import dlt
import pendulum

@dlt.hub.transformation(
write_disposition="merge",
primary_key="id",
incremental=dlt.sources.incremental(
"created_at",
initial_value=pendulum.datetime(2000, 1, 1, tz="UTC"),
),
)
def recent_orders(dataset: dlt.Dataset) -> Any:
yield dataset.table("orders")

pipeline.run(recent_orders(pipeline.dataset()))

A few highlights:

  • Eager or lazy execution — run transformations eagerly on your own compute (pull data as DataFrames/Arrow tables and process them in Python) or lazily as SQL pushed down to the warehouse, where nothing leaves the destination. You pick per transformation; the decorator stays the same.
  • Incremental out of the box — stateful cursors (as above) or scheduler-set [start, end) windows for idempotent retries and partition backfills on the dltHub platform.
  • Schema evolution and lineage — result tables evolve automatically and column-level hints are forwarded from source to transformed tables.
  • Write in what you know — Ibis expressions, raw SQL, or Pandas/Polars DataFrames and Arrow tables.
  • Full resource semanticswrite_disposition, primary_key, merge loading, and grouping transformations into sources all work as with regular resources.

See the dltHub transformations docs for the full guide. Use of dltHub is subject to a commercial dltHub License.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.