dlt.destinations.impl.ducklake.ducklake
DuckDB object hierarchy
Here are short definitions and relationships between DuckDB objects. This should help disambiguate names used in Duckdb, DuckLake, and dlt.
TL;DR:
- scalar < column < table < schema (dataset) < database = catalog
- Typically, in duckdb, you have one catalog = one database = one file
- When using ATTACH, you're addingCatalogto yourDatabase- Though if you do SHOW ALL TABLES, the result column "database" should be "catalog" to be precise
 
- Though if you do 
Hierarchy:
- A Tablecan have manyColumn
- A Schemacan have manyTable
- A Databasecan have manySchema(corresponds to dataset in dlt)
- A Databaseis a single physical file (e.g.,db.duckdb)
- A Databasehas a singleCatalog
- A Catalogis the internal metadata structure of everything found in the database
- Using ATTACHadds aCatalogto the
In dlt:
- dlt creates a duckdb Databaseper pipeline when usingdlt.pipeline(..., destination="duckdb")
- dlt stores the data inside a Schemathat matches the name of thedlt.Dataset
- when setting the pipeline destination to a specific duckdb Database, you can store multipledlt.Datasetinside the same instance (each with its own duckdbSchema).
DuckLake object hierarchy
TL;DR:
- scalar < column < table < schema < snapshot < database = catalog
Hierarchy:
- A Catalogis an SQL database to store metadata- In duckdb terms, it's a duckdb Databasethat implements the duckdbCatalogfor the DuckLake
 
- In duckdb terms, it's a duckdb 
- A Cataloghas many Schemas (namespaces if you compare it to Iceberg) that correspond to dlt.Dataset
- A Storageis a file system or object store that can store parquet files
- A Snapshotreferences to theCatalogat a particular point in time- This places Snapshotat the top of the hierarchy because it scopes other constructs
 
- This places 
Using the ducklake extension, the following command in duckdb
ATTACH 'ducklake:{catalog_database}' (DATA_PATH '{data_storage}');
adds the ducklake Catalog to your duckdb database
DuckLakeCopyJob Objects
class DuckLakeCopyJob(DuckDbCopyJob)
metrics
def metrics() -> Optional[LoadJobMetrics]
Generate remote url metrics which point to the table in storage
DuckLakeClient Objects
class DuckLakeClient(DuckDbClient)
Destination client to interact with a DuckLake
A DuckLake has 3 components:
- ducklake client: this is a duckdbinstance with theducklakeextension
- catalog: this is an SQL database storing metadata. It can be a duckdb instance (typically the ducklake client) or a remote database (sqlite, postgres, mysql)
- storage: this is a filesystem where data is stored in files
The dlt DuckLake destination gives access to the "ducklake client". You never have to manage the catalog and storage directly; this is done through the ducklake client.