dlt.destinations.impl.databricks.databricks_adapter
databricks_adapter
def databricks_adapter(
data: Any,
cluster: Union[TColumnNames, Literal["AUTO"]] = None,
partition: TColumnNames = None,
table_format: Literal["DELTA", "ICEBERG"] = "DELTA",
table_comment: Optional[str] = None,
table_tags: Optional[List[Union[str, Dict[str, str]]]] = None,
table_properties: Optional[Dict[str, Union[str, int, bool, float]]] = None,
column_hints: Optional[TDatabricksTableSchemaColumns] = None
) -> DltResource
Prepares data for loading into Databricks.
This function takes data, which can be raw or already wrapped in a DltResource object, and prepares it for Databricks by optionally specifying clustering, partitioning, and table description.
Arguments:
dataAny - The data to be transformed. This can be raw data or an instance of DltResource. If raw data is provided, the function will wrap it into aDltResourceobject.clusterUnion[TColumnNames, Literal["AUTO"]], optional - A column name, list of column names, or "AUTO" to cluster the Databricks table by. Use "AUTO" to let Databricks automatically determine the best clustering.partitionTColumnNames, optional - A column name or list of column names to partition the Databricks table by. Partitioning divides the table into separate files based on the partition column values.table_formatLiteral["DELTA", "ICEBERG"], optional - The table format to use. Defaults to "DELTA". Use "ICEBERG" to create Apache Iceberg tables for better schema evolution and time travel capabilities.table_commentstr, optional - A description for the Databricks table.table_tagsList[Union[str, Dict[str, str]]], optional - A list of tags for the Databricks table. Can contain a mix of strings and key-value pairs as dictionaries.Example- ["production", {"environment": "prod"}, "employees"]table_propertiesDict[str, Union[str, int, bool, float]], optional - A dictionary of table properties to be added to the Databricks table using TBLPROPERTIES. These are key-value pairs for metadata and Delta Lake optimization settings. Example: {"delta.appendOnly": True, "delta.logRetentionDuration": "30 days"}column_hintsTTableSchemaColumns, optional - A dictionary of column hints. Each key is a column name, and the value is a dictionary of hints. The supported hints are:column_comment- adds a comment to the column. Supports basic markdown format basic-syntax.column_tags- adds tags to the column. Supports a list of strings and/or key-value pairs.
Returns:
A DltResource object that is ready to be loaded into Databricks.
Raises:
ValueError- If any hint is invalid or none are specified.
Examples:
data = [{"name": "Marcel", "description": "Raccoon Engineer", "date_hired": 1700784000}]
databricks_adapter(data, cluster="date_hired", table_comment="Employee Data",
... table_tags=["production", {"environment": "prod"}, "employees"])
# Use AUTO clustering
databricks_adapter(data, cluster="AUTO", table_comment="Auto-clustered table")
# Use partitioning
databricks_adapter(data, partition=["year", "month"], cluster="customer_id")
# Create Iceberg table
databricks_adapter(data, table_format="ICEBERG", cluster="customer_id")