Snowflake Plus
This page is for dlt+, which requires a license. Join our early access program for a trial license.
Snowflake Plus is an experimental extension of the Snowflake destination that adds Apache Iceberg tables creation and related features. This destination is available starting from dlt+ version 0.9.0. It fully supports all the functionality of the standard Snowflake destination, plus:
- The ability to create Iceberg tables in Snowflake by configuring
iceberg_mode
in yourconfig.toml
file ordlt.yml
file. - Additional configuration for Iceberg tables in Snowflake via:
external_volume
: The external volume name where Iceberg data is stored.catalog
: The catalog name in which Iceberg tables are created. Defaults to"SNOWFLAKE"
.base_location
: A template string for the base path that Snowflake uses for storing the table data in external storage, supporting placeholders.extra_placeholders
: Additional values that can be used in thebase_location
template.catalog_sync
: The name of a catalog integration configured for Snowflake Open Catalog. If specified, Snowflake syncs Snowflake-managed Iceberg tables in the database with an external catalog in your Snowflake Open Catalog account.
Installation
Install the dlt-plus
package with the snowflake
extra:
pip install "dlt-plus[snowflake]"
Once the snowflake
extra is installed, you can configure a pipeline to use snowflake_plus
exactly the same way you would use the snowflake
destination.
Setup
- Configure your Snowflake credentials
- Set up a database user and permissions
- Configure an external volume in Snowflake
- Grant usage on the external volume to the role you are using to load data:
GRANT USAGE ON EXTERNAL VOLUME <external_volume_name> TO ROLE <role_name>;
- Configure the
snowflake_plus
destination. For a dlt+ project (indlt.yml
) or for a Python script (inconfig.toml
):
- dlt.yml
- config.toml
If you don't have a dlt+ project yet, initialize one in the current working directory. Replace sql_database
with the source of your choice:
dlt project init sql_database snowflake_plus
This will create a Snowflake Plus destination in your dlt.yml
file:
destinations:
snowflake:
type: snowflake_plus
To enable Iceberg table creation, set the iceberg_mode
option and external_volume
to the name of the external volume you created in step 3.
destinations:
snowflake:
type: snowflake_plus
external_volume: "<external_volume_name>"
iceberg_mode: "all"
Add the configuration to your config.toml
file:
[destination.snowflake]
external_volume = "<external_volume_name>"
iceberg_mode = "all"
Use the snowflake_plus
destination in your pipeline:
import dlt
pipeline = dlt.pipeline(
pipeline_name="my_snowflake_plus_pipeline",
destination="snowflake_plus",
dataset_name="my_dataset"
)
@dlt.resource
def my_iceberg_table():
...
Configuration
The snowflake_plus
destination extends the standard Snowflake configuration with additional options:
iceberg_mode
Controls which tables are created as Iceberg tables.
- Possible values:
"all"
: All tables including dlt system tables are created as Iceberg tables"data_tables"
: Only data tables (non-dlt system tables) are created as Iceberg tables"none"
: No tables are created as Iceberg tables
- Required: No
- Default:
"none"
external_volume
The external volume to store Iceberg metadata.
- Required: Yes
- Default: None
catalog
The catalog to use for Iceberg tables.
- Required: No
- Default:
"SNOWFLAKE"
. This will use Snowflake as the catalog for the Iceberg tables.
base_location
Template string for the base location where Iceberg data is stored in the external volume. Supports placeholders like {dataset_name}
and {table_name}
.
- Required: No
- Default:
"{dataset_name}/{table_name}"
extra_placeholders
Dictionary of additional values that can be used in the base_location
template. The values can be static strings or functions that accept the dataset name and table name as arguments and return a string.
- Required: No
- Default: None
catalog_sync
The name of a catalog integration for syncing Iceberg tables to an external catalog in Snowflake Open Catalog.
- Required: No
- Default: None
Configure these options in your config.toml
file under the [destination.snowflake]
section or in dlt.yml
file under the destinations.snowflake_plus
section.
Base location templating
The base_location
parameter controls where Snowflake stores your Iceberg table data and metadata in the external volume. It's a template string that supports the following built-in placeholders:
{dataset_name}
: The name of your dataset{table_name}
: The name of the table
For more flexibility, you can also define custom placeholders using the extra_placeholders
option.
Examples
-
The default pattern
{dataset_name}/{table_name}
creates paths likemy_dataset/customers
in your external volume. -
Custom static path:
base_location: "custom/static/path"
This creates all tables in the same directory
custom/static/path
. -
Using custom placeholders:
base_location: "{env}/{dataset_name}/{table_name}"
extra_placeholders:
env: "prod"This creates paths like
prod/my_dataset/customers
.
How Snowflake uses the base location
When you provide a base_location
, Snowflake uses it to create the paths where data and metadata are stored in your external cloud storage. The actual directory structure Snowflake creates follows this pattern:
STORAGE_BASE_URL/BASE_LOCATION.<randomId>/[data | metadata]/
Where <randomId>
is a random Snowflake-generated 8-character string appended to create a unique directory.
For more details on how Snowflake organizes Iceberg table files in external storage, see the Snowflake documentation on data and metadata directories.
Write dispositions
All standard write dispositions (append
, replace
, and merge
) are supported for both regular Snowflake tables and Iceberg tables.
Data types
The Snowflake Plus destination supports all standard Snowflake destination data types, with additional type mappings for Iceberg tables:
dlt Type | Iceberg Type |
---|---|
text | string |
bigint | long , int |
double | double |
bool | boolean |
timestamp | timestamp |
date | date |
time | time |
decimal | decimal |
binary | binary |
json | string |
Syncing Snowflake-managed Iceberg tables to Snowflake Open Catalog
To enable querying of Snowflake-managed Iceberg tables by third-party engines (e.g., Apache Spark) via an external catalog (Snowflake Open Catalog), use the catalog_sync
configuration option. This setting specifies a catalog integration that syncs Iceberg tables to the external catalog.
Setup
-
Create an external catalog in Snowflake Open Catalog.
-
Create a catalog integration in Snowflake. Example:
CREATE OR REPLACE CATALOG INTEGRATION my_open_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<orgname>-<my-snowflake-open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = 'myOpenCatalogExternalCatalogName'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = 'myClientId'
OAUTH_CLIENT_SECRET = 'myClientSecret'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
Refer to the Snowflake documentation for detailed setup instructions.
- Configure the
catalog_sync
option:
- dlt.yml
- config.toml
destinations:
snowflake:
type: snowflake_plus
# ... other configuration
catalog_sync: "my_open_catalog_int"
[destination.snowflake]
# ... other configuration
catalog_sync = "my_open_catalog_int"
Additional Resources
For more information on basic Snowflake destination functionality, please refer to the Snowflake destination documentation.