Version: devel

dlt.destinations.impl.databricks.databricks_adapter

databricks_adapter

def databricks_adapter(
    data: Any,
    cluster: Union[TColumnNames, Literal["AUTO"]] = None,
    partition: TColumnNames = None,
    table_format: Literal["DELTA", "ICEBERG"] = "DELTA",
    table_comment: Optional[str] = None,
    table_tags: Optional[List[Union[str, Dict[str, str]]]] = None,
    table_properties: Optional[Dict[str, Union[str, int, bool, float]]] = None,
    column_hints: Optional[TDatabricksTableSchemaColumns] = None
) -> DltResource

View source on GitHub

Prepares data for loading into Databricks.

This function takes data, which can be raw or already wrapped in a DltResource object, and prepares it for Databricks by optionally specifying clustering, partitioning, and table description.

Arguments:

data Any - The data to be transformed. This can be raw data or an instance of DltResource. If raw data is provided, the function will wrap it into a DltResource object.
cluster Union[TColumnNames, Literal["AUTO"]], optional - A column name, list of column names, or "AUTO" to cluster the Databricks table by. Use "AUTO" to let Databricks automatically determine the best clustering.
partition TColumnNames, optional - A column name or list of column names to partition the Databricks table by. Partitioning divides the table into separate files based on the partition column values.
table_format Literal["DELTA", "ICEBERG"], optional - The table format to use. Defaults to "DELTA". Use "ICEBERG" to create Apache Iceberg tables for better schema evolution and time travel capabilities.
table_comment str, optional - A description for the Databricks table.
table_tags List[Union[str, Dict[str, str]]], optional - A list of tags for the Databricks table. Can contain a mix of strings and key-value pairs as dictionaries.
Example - ["production", {"environment": "prod"}, "employees"]
table_properties Dict[str, Union[str, int, bool, float]], optional - A dictionary of table properties to be added to the Databricks table using TBLPROPERTIES. These are key-value pairs for metadata and Delta Lake optimization settings. Example: {"delta.appendOnly": True, "delta.logRetentionDuration": "30 days"}
column_hints TTableSchemaColumns, optional - A dictionary of column hints. Each key is a column name, and the value is a dictionary of hints. The supported hints are:
- column_comment - adds a comment to the column. Supports basic markdown format basic-syntax.
- column_tags - adds tags to the column. Supports a list of strings and/or key-value pairs.

Returns:

A DltResource object that is ready to be loaded into Databricks.

Raises:

ValueError - If any hint is invalid or none are specified.

Examples:

    data = [&#123;"name": "Marcel", "description": "Raccoon Engineer", "date_hired": 1700784000&#125;]
    databricks_adapter(data, cluster="date_hired", table_comment="Employee Data",

... table_tags=["production", {"environment": "prod"}, "employees"])

    # Use AUTO clustering
    databricks_adapter(data, cluster="AUTO", table_comment="Auto-clustered table")
    # Use partitioning
    databricks_adapter(data, partition=["year", "month"], cluster="customer_id")
    # Create Iceberg table
    databricks_adapter(data, table_format="ICEBERG", cluster="customer_id")

dlt.destinations.impl.databricks.databricks_adapter

databricks_adapter

DHelp

Ask a question

databricks_adapter​

DHelp

Ask a question

databricks_adapter