Setup
This page is for dlt+, which requires a license. Join our early access program for a trial license.
dlt+ provides a powerful mechanism for executing transformations on your data using a locally spun-up cache. It automatically creates and manages the cache before execution and cleans it up afterward.
A transformation consists of functions that modify data stored in a cache. These transformations can be implemented using:
By combining a cache and transformations, you can efficiently process data loaded via dlt and move it to a new destination.
Local transformations are currently limited to specific use cases and are only compatible with data stored in filesystem-based destinations:
Make sure to specify a dataset located in a filesystem-based destination when defining a cache.
To use this feature, follow these steps:
- Configure the
dlt.yml
file: define a cache and specify transformations. - Generate scaffolding: automatically create transformation templates.
- Modify transformations: update the generated Python functions or dbt models.
- Run transformations: execute them on your data.
Configure dlt.yml
fileโ
Before setting up the transformations in the dlt.yml
file, you need to make sure you have defined the cache.
Defining the cacheโ
You can find detailed instructions on how to define a cache in the cache core concept. Here's an example:
caches:
github_events_cache:
inputs:
- dataset: github_events_dataset
tables:
items: items
outputs:
- dataset: github_events_dataset
tables:
items: items
items_aggregated: items_aggregated
Please make sure that the input dataset for the cache is located in a filesystem-based destination (Iceberg, Delta, or Cloud storage and filesystem).
Defining transformationsโ
Specify transformations in dlt.yml
with the following parameters:
- unique identifier for the transformation.
- engine โ choose between:
arrow
for Python-based transformationsdbt
for dbt-based transformations
- cache โ the cache that the transformation will run on.
For example,
transformations:
github_events_transformations:
engine: dbt
cache: github_events_cache
Generate scaffoldingโ
To create transformation scaffolding based on your dlt pipeline:
- Run the dlt pipeline at least once; this ensures dlt has the dataset schemas.
- Execute the following CLI command:
dlt transformation <transformation-name> render-t-layer
This will generate transformation files inside the ./transformations
folder. Depending on the engine:
- For Python transformations: a Python script with transformation functions (learn more)
- For dbt transformations: dbt models (learn more)
Each generated transformation includes models for managing incremental loading states via dlt_load_id
.
Modify transformationsโ
Now you can update the generated transformations and create new ones to reflect the desired behavior. We recommend keeping the incremental approach as in the generated models.
Run transformationsโ
dlt+ offers comprehensive CLI support for executing transformations. You can find the full list of available commands in the command line interface.
To run the defined transformation, use the following command:
dlt transformation <transformation_name> run
This command populates the local cache, applies the defined transformations, and then flushes the transformed tables to the specified destination.