Version: devel

Setup

dlt+

This page is for dlt+, which requires a license. Join our early access program for a trial license.

dlt+ provides a powerful mechanism for executing transformations on your data using a locally spun-up cache. It automatically creates and manages the cache before execution and cleans it up afterward.

A transformation consists of functions that modify data stored in a cache. These transformations can be implemented using:

By combining a cache and transformations, you can efficiently process data loaded via dlt and move it to a new destination.

caution

Local transformations are currently limited to specific use cases and are only compatible with data stored in filesystem-based destinations:

Make sure to specify a dataset located in a filesystem-based destination when defining a cache.

To use this feature, follow these steps:

Configure the dlt.yml file: define a cache and specify transformations.
Generate scaffolding: automatically create transformation templates.
Modify transformations: update the generated Python functions or dbt models.
Run transformations: execute them on your data.

Configure `dlt.yml` file

Before setting up the transformations in the dlt.yml file, you need to make sure you have defined the cache.

Defining the cache

You can find detailed instructions on how to define a cache in the cache core concept. Here's an example:

caches:
  github_events_cache:
    inputs:
      - dataset: github_events_dataset
        tables:
          items: items
    outputs:
      - dataset: github_events_dataset
        tables:
          items: items
          items_aggregated: items_aggregated

caution

Please make sure that the input dataset for the cache is located in a filesystem-based destination (Iceberg, Delta, or Cloud storage and filesystem).

Defining transformations

Specify transformations in dlt.yml with the following parameters:

unique identifier for the transformation.
engine – choose between:
- arrow for Python-based transformations
- dbt for dbt-based transformations
cache – the cache that the transformation will run on.

For example,

transformations:
  github_events_transformations:
    engine: dbt
    cache: github_events_cache

Generate scaffolding

To create transformation scaffolding based on your dlt pipeline:

Run the dlt pipeline at least once; this ensures dlt has the dataset schemas.
Execute the following CLI command:

dlt transformation <transformation-name> render-t-layer

This will generate transformation files inside the ./transformations folder. Depending on the engine:

For Python transformations: a Python script with transformation functions (learn more)
For dbt transformations: dbt models (learn more)

Each generated transformation includes models for managing incremental loading states via dlt_load_id.

Modify transformations

Now you can update the generated transformations and create new ones to reflect the desired behavior. We recommend keeping the incremental approach as in the generated models.

Run transformations

dlt+ offers comprehensive CLI support for executing transformations. You can find the full list of available commands in the command line interface.

To run the defined transformation, use the following command:

dlt transformation <transformation_name> run

This command populates the local cache, applies the defined transformations, and then flushes the transformed tables to the specified destination.

Setup

Configure `dlt.yml` file

Defining the cache

Defining transformations

Generate scaffolding

Modify transformations

Run transformations

DHelp

Ask a question

Configure dlt.yml file​

Defining the cache​

Defining transformations​

Generate scaffolding​

Modify transformations​

Run transformations​

DHelp

Ask a question

Configure `dlt.yml` file

Defining the cache

Defining transformations

Generate scaffolding

Modify transformations

Run transformations