Skip to main content
Version: 1.10.0 (latest)

Snowflake Plus

dlt+

This page is for dlt+, which requires a license. Join our early access program for a trial license.

Snowflake Plus is an experimental extension of the Snowflake destination that adds Apache Iceberg tables creation and related features. This destination is available starting from dlt+ version 0.9.0. It fully supports all the functionality of the standard Snowflake destination, plus:

  1. The ability to create Iceberg tables in Snowflake by configuring iceberg_mode in your config.toml file or dlt.yml file.
  2. Additional configuration for Iceberg tables in Snowflake via:
    • external_volume: The external volume name where Iceberg data is stored.
    • catalog: The catalog name in which Iceberg tables are created. Defaults to "SNOWFLAKE".
    • base_location: A template string for the base path that Snowflake uses for storing the table data in external storage, supporting placeholders.
    • extra_placeholders: Additional values that can be used in the base_location template.
    • catalog_sync: The name of a catalog integration configured for Snowflake Open Catalog. If specified, Snowflake syncs Snowflake-managed Iceberg tables in the database with an external catalog in your Snowflake Open Catalog account.

Installation

Install the dlt-plus package with the snowflake extra:

pip install "dlt-plus[snowflake]"

Once the snowflake extra is installed, you can configure a pipeline to use snowflake_plus exactly the same way you would use the snowflake destination.

Setup

  1. Configure your Snowflake credentials
  2. Set up a database user and permissions
  3. Configure an external volume in Snowflake
  4. Grant usage on the external volume to the role you are using to load data:
GRANT USAGE ON EXTERNAL VOLUME <external_volume_name> TO ROLE <role_name>;
  1. Configure the snowflake_plus destination. For a dlt+ project (in dlt.yml) or for a Python script (in config.toml):

If you don't have a dlt+ project yet, initialize one in the current working directory. Replace sql_database with the source of your choice:

dlt project init sql_database snowflake_plus

This will create a Snowflake Plus destination in your dlt.yml file:

destinations:
snowflake:
type: snowflake_plus

To enable Iceberg table creation, set the iceberg_mode option and external_volume to the name of the external volume you created in step 3.

destinations:
snowflake:
type: snowflake_plus
external_volume: "<external_volume_name>"
iceberg_mode: "all"

Configuration

The snowflake_plus destination extends the standard Snowflake configuration with additional options:

iceberg_mode

Controls which tables are created as Iceberg tables.

  • Possible values:
    • "all": All tables including dlt system tables are created as Iceberg tables
    • "data_tables": Only data tables (non-dlt system tables) are created as Iceberg tables
    • "none": No tables are created as Iceberg tables
  • Required: No
  • Default: "none"

external_volume

The external volume to store Iceberg metadata.

  • Required: Yes
  • Default: None

catalog

The catalog to use for Iceberg tables.

base_location

Template string for the base location where Iceberg data is stored in the external volume. Supports placeholders like {dataset_name} and {table_name}.

  • Required: No
  • Default: "{dataset_name}/{table_name}"

extra_placeholders

Dictionary of additional values that can be used in the base_location template. The values can be static strings or functions that accept the dataset name and table name as arguments and return a string.

  • Required: No
  • Default: None

catalog_sync

The name of a catalog integration for syncing Iceberg tables to an external catalog in Snowflake Open Catalog.

  • Required: No
  • Default: None

Configure these options in your config.toml file under the [destination.snowflake] section or in dlt.yml file under the destinations.snowflake_plus section.

Base location templating

The base_location parameter controls where Snowflake stores your Iceberg table data and metadata in the external volume. It's a template string that supports the following built-in placeholders:

  • {dataset_name}: The name of your dataset
  • {table_name}: The name of the table

For more flexibility, you can also define custom placeholders using the extra_placeholders option.

Examples

  1. The default pattern {dataset_name}/{table_name} creates paths like my_dataset/customers in your external volume.

  2. Custom static path:

    base_location: "custom/static/path"

    This creates all tables in the same directory custom/static/path.

  3. Using custom placeholders:

    base_location: "{env}/{dataset_name}/{table_name}"
    extra_placeholders:
    env: "prod"

    This creates paths like prod/my_dataset/customers.

How Snowflake uses the base location

When you provide a base_location, Snowflake uses it to create the paths where data and metadata are stored in your external cloud storage. The actual directory structure Snowflake creates follows this pattern:

STORAGE_BASE_URL/BASE_LOCATION.<randomId>/[data | metadata]/

Where <randomId> is a random Snowflake-generated 8-character string appended to create a unique directory.

For more details on how Snowflake organizes Iceberg table files in external storage, see the Snowflake documentation on data and metadata directories.

Write dispositions

All standard write dispositions (append, replace, and merge) are supported for both regular Snowflake tables and Iceberg tables.

Data types

The Snowflake Plus destination supports all standard Snowflake destination data types, with additional type mappings for Iceberg tables:

dlt TypeIceberg Type
textstring
bigintlong, int
doubledouble
boolboolean
timestamptimestamp
datedate
timetime
decimaldecimal
binarybinary
jsonstring

Syncing Snowflake-managed Iceberg tables to Snowflake Open Catalog

To enable querying of Snowflake-managed Iceberg tables by third-party engines (e.g., Apache Spark) via an external catalog (Snowflake Open Catalog), use the catalog_sync configuration option. This setting specifies a catalog integration that syncs Iceberg tables to the external catalog.

Setup

  1. Create an external catalog in Snowflake Open Catalog.

  2. Create a catalog integration in Snowflake. Example:

  CREATE OR REPLACE CATALOG INTEGRATION my_open_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<orgname>-<my-snowflake-open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = 'myOpenCatalogExternalCatalogName'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = 'myClientId'
OAUTH_CLIENT_SECRET = 'myClientSecret'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;

Refer to the Snowflake documentation for detailed setup instructions.

  1. Configure the catalog_sync option:
[destination.snowflake]
# ... other configuration
catalog_sync = "my_open_catalog_int"

Additional Resources

For more information on basic Snowflake destination functionality, please refer to the Snowflake destination documentation.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.