dlt.dataset.lineage
create_sqlglot_schema
def create_sqlglot_schema(schema_map: Dict[str, Sequence[dlt.Schema]],
dialect: TSqlGlotDialect) -> SQLGlotSchema
Create an SQLGlot schema from multiple dlt schemas grouped by dataset name.
Each key in schema_map becomes a top-level qualifier (SQL schema /
catalog) that scopes all tables underneath it. Tables from multiple dlt
schemas that share a dataset name are merged via Schema.unify_schemas;
the first schema in each sequence is treated as the default and wins on
column-level collisions.
Arguments:
schema_map- Mapping of dataset_name to a list of dlt schemas. The dataset name is used as the qualifying namespace in the generated SQLGlot schema.dialect- SQLGlot dialect for the target destination.
compute_columns_schema
def compute_columns_schema(
expression: sge.Expression,
sqlglot_schema: SQLGlotSchema,
dialect: TSqlGlotDialect,
infer_sqlglot_schema: bool = True,
allow_anonymous_columns: bool = True,
allow_partial: bool = True
) -> Tuple[TTableSchemaColumns, Optional[sge.Query]]
Compute the expected dlt columns schema for the output of an SQL SELECT query. No case-folding or quoting is performed on the query.
Arguments:
-
infer_sqlglot_schemabool - If False, all columns and tables referenced must be derived from the SQLGlot schema. If True, allow columns and tables not found in SQLGlot schema -
allow_anonymous_columnsbool - If False, all columns in final selection must have an explicit name or alias. If True, the name of columns from the final selection can be generated by the dialect -
allow_partialbool - If False, raise exceptions if the schema returned is incomplete. If True, this function always returns a dictionary, even in cases of SQL parsing errors, missing table reference, unresolvedSELECT *, etc. -
Returns- tuple of dlt columns schema and qualifiedsql_query