Skip to main content
Version: devel

Setup

dlt+

This page is for dlt+, which requires a license. Join our early access program for a trial license.

dlt+ provides a powerful mechanism for executing transformations on your data using a locally spun-up cache. It automatically creates and manages the cache before execution and cleans it up afterward.

A transformation consists of functions that modify data stored in a cache. These transformations can be implemented using:

By combining a cache and transformations, you can efficiently process data loaded via dlt and move it to a new destination.

caution

Local transformations are currently limited to specific use cases and are only compatible with data stored in filesystem-based destinations:

Make sure to specify a dataset located in a filesystem-based destination when defining a cache.

To use this feature, follow these steps:

  1. Configure the dlt.yml file: define a cache and specify transformations.
  2. Generate scaffolding: automatically create transformation templates.
  3. Modify transformations: update the generated Python functions or dbt models.
  4. Run transformations: execute them on your data.

Configure dlt.yml fileโ€‹

Before setting up the transformations in the dlt.yml file, you need to make sure you have defined the cache.

Defining the cacheโ€‹

You can find detailed instructions on how to define a cache in the cache core concept. Here's an example:

caches:
github_events_cache:
inputs:
- dataset: github_events_dataset
tables:
items: items
outputs:
- dataset: github_events_dataset
tables:
items: items
items_aggregated: items_aggregated
caution

Please make sure that the input dataset for the cache is located in a filesystem-based destination (Iceberg, Delta, or Cloud storage and filesystem).

Defining transformationsโ€‹

Specify transformations in dlt.yml with the following parameters:

  • unique identifier for the transformation.
  • engine โ€“ choose between:
    • arrow for Python-based transformations
    • dbt for dbt-based transformations
  • cache โ€“ the cache that the transformation will run on.

For example,

transformations:
github_events_transformations:
engine: dbt
cache: github_events_cache

Generate scaffoldingโ€‹

To create transformation scaffolding based on your dlt pipeline:

  1. Run the dlt pipeline at least once; this ensures dlt has the dataset schemas.
  2. Execute the following CLI command:
dlt transformation <transformation-name> render-t-layer

This will generate transformation files inside the ./transformations folder. Depending on the engine:

  • For Python transformations: a Python script with transformation functions (learn more)
  • For dbt transformations: dbt models (learn more)

Each generated transformation includes models for managing incremental loading states via dlt_load_id.

Modify transformationsโ€‹

Now you can update the generated transformations and create new ones to reflect the desired behavior. We recommend keeping the incremental approach as in the generated models.

Run transformationsโ€‹

dlt+ offers comprehensive CLI support for executing transformations. You can find the full list of available commands in the command line interface.

To run the defined transformation, use the following command:

dlt transformation <transformation_name> run

This command populates the local cache, applies the defined transformations, and then flushes the transformed tables to the specified destination.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.