Delta table format
dlt supports writing Delta tables when using the filesystem destination.
How it works
dlt uses the deltalake library to write Delta tables. One or multiple Parquet files are prepared during the extract and normalize steps. In the load step, these Parquet files are exposed as an Arrow data structure and fed into deltalake.
Delta dependencies
You need the deltalake package to use this format:
pip install "dlt[deltalake]"
You also need pyarrow>=17.0.0:
pip install 'pyarrow>=17.0.0'
Set table format
Set the table_format argument to delta when defining your resource:
@dlt.resource(table_format="delta")
def my_delta_resource():
...
or when calling run on your pipeline:
pipeline.run(my_resource, table_format="delta")
dlt always uses Parquet as loader_file_format when using the delta table format. Any setting of loader_file_format is disregarded.
Table format partitioning
Delta tables can be partitioned by specifying one or more partition column hints. This example partitions a Delta table by the foo column:
@dlt.resource(
table_format="delta",
columns={"foo": {"partition": True}}
)
def my_delta_resource():
...
Delta uses Hive-style partitioning.
Partition evolution (changing partition columns after a table has been created) is not supported.