dlt.common.libs.sqlglot
from_sqlglot_type
def from_sqlglot_type(sqlglot_type: DATA_TYPE) -> TColumnType
Convert a SQLGlot DataType to dlt column hints.
reference: https://dlthub.com/docs/general-usage/schema/#tables-and-columns
to_sqlglot_type
def to_sqlglot_type(dlt_type: TDataType,
precision: Optional[int] = None,
scale: Optional[int] = None,
timezone: Optional[bool] = None,
nullable: Optional[bool] = None,
use_named_types: bool = False) -> DataType
Convert the dlt data_type and column hints to a SQLGlot DataType expression.
if use_named_type is True: use "named" types with fallback on "parameterized" types.
else: use "parameterized" types everywhere.
Named types:
- have some set attributes, e.g.,
DataType.Type.DECIMAL64hasprecision=16andscale=4. - can be referenced directly in SQL (table definition, CAST(), which could make SQLGlot transpiling more reliable.
- may not exist in all dialects and will be casted automatically during transpiling
- can have limited expressivity, e.g., named types for timestamps only included precision 0, 3, 9
Parameterized types:
- instead of DECIMAL64, a generic
DataType.Type.DECIMALis used withDataTypeParamexpressions attached to representprecisionandscale. SQLGlot is responsible for properly compiling and transpiling them. - parameters might be handled differently across dialects and would require greater testing on the dlt side.
reference: https://dlthub.com/docs/general-usage/schema/#tables-and-columns
set_metadata
def set_metadata(sqlglot_type: sge.DataType,
metadata: TColumnSchema) -> sge.DataType
Set a metadata dictionary on the SQGLot DataType object.
By attaching dlt hints to a DataType object, they will be propagated until the DataType is modified.
get_metadata
def get_metadata(sqlglot_type: sge.DataType) -> TColumnSchema
Get a metadata dictionary from the SQLGlot DataType object.
query_is_complex
def query_is_complex(parsed_select: Union[sge.Select, sge.Union],
columns: Set[str]) -> bool
Return True unless the query is provably “simple”. A simple query
- references exactly one physical table,
- contains no complex constructs (CTEs, sub-queries, derived tables, unions, window functions, GROUP BY, DISTINCT, etc.),
- projects either • a plain/qualified star with only constant literals after it, or • the full, explicit list of all columns with only constant literals after it. Anything we cannot prove to be simple is conservatively flagged as complex.
build_typed_literal
def build_typed_literal(
value: Any,
sqlglot_type: sge.DataType = None) -> Union[sge.Expression, sge.Tuple]
Create a literal and CAST it to the requested sqlglot DataType.
SqlModel Objects
class SqlModel()
A SqlModel is a named tuple that contains a query and a dialect. It is used to represent a SQL query and the dialect to use for parsing it.
from_query_string
@classmethod
def from_query_string(cls,
query: str,
dialect: Optional[str] = None) -> "SqlModel"
Creates a SqlModel from a raw SQL query string using sqlglot. Ensures that the parsed query is an instance of sqlglot.exp.Select.
Arguments:
query- The raw SQL query string.dialect- The SQL dialect to use for parsing.
Returns:
An instance of SqlModel with the normalized query and dialect.
Raises:
ValueError- If the parsed query is not an instance of sqlglot.exp.Select.
uuid_expr_for_dialect
def uuid_expr_for_dialect(dialect: TSqlGlotDialect,
load_id: str) -> sge.Expression
Generates a UUID expression based on the specified dialect.
Arguments:
dialect- The SQL dialect for which the UUID expression needs to be generated.load_id- The load ID used for deterministic UUID generation (redshift).
Returns:
A SQL expression that generates a UUID for the specified dialect.
get_select_column_names
def get_select_column_names(
selects: List[sge.Expression],
dialect: Optional[TSqlGlotDialect] = None) -> List[str]
Extract output column names from a SELECT clause's expression list.
Handles Alias (returns alias), Column and Dot (returns output_name with fallback to name — needed for BigQuery quoted identifiers that parse as Dot). Raises ValueError for star expressions or unsupported expression types.
Arguments:
selects- The.selectslist from a parsed SELECT statement.dialect- SQL dialect used only for error message formatting.
Returns:
Column output names in SELECT order.
filter_select_column_names
def filter_select_column_names(
selects: List[sge.Expression],
discard_columns: Set[str],
normalize_fn: Callable[[str], str],
dialect: Optional[TSqlGlotDialect] = None) -> List[str]
Return SELECT column names excluding those whose normalized form is in discard_columns.
Preserves original SELECT order.
Arguments:
selects- The.selectslist from a parsed SELECT statement.discard_columns- Set of normalized column names to exclude.normalize_fn- Callable that normalizes a column name (e.g. casefold).dialect- SQL dialect used only for error message formatting.
Returns:
Remaining column names in their original SELECT order.
validate_no_star_select
def validate_no_star_select(parsed_select: sge.Select,
dialect: TSqlGlotDialect) -> None
Raises ValueError if the SELECT statement contains a star expression.
Arguments:
parsed_select- The parsed SELECT statement.dialect- The SQL dialect (used for error message formatting).
build_outer_select_statement
def build_outer_select_statement(
select_dialect: TSqlGlotDialect, parsed_select: sge.Select,
columns: TTableSchemaColumns,
normalize_casefold_fn: Callable[[str],
str]) -> Tuple[sge.Select, bool]
Wraps the parsed SELECT in a subquery and builds an outer SELECT statement.
Arguments:
select_dialect- The SQL dialect to use for parsing and formatting.parsed_select- The parsed SELECT statement.columns- The schema columns to match.normalize_casefold_fn- A callable that normalizes and casefolds an identifier.
Returns:
Tuple of the outer SELECT statement and a flag indicating if reordering is needed.
reorder_or_adjust_outer_select
def reorder_or_adjust_outer_select(outer_parsed_select: sge.Select,
columns: TTableSchemaColumns,
normalize_casefold_fn: Callable[[str], str],
schema_name: str, table_name: str) -> None
Reorders or adjusts the SELECT statement to match the schema.
Adds missing columns as NULL and removes extra columns not in the schema.
Arguments:
outer_parsed_select- The parsed outer SELECT statement.columns- The schema columns to match.normalize_casefold_fn- A callable that normalizes and casefolds an identifier.schema_name- The schema name (used for error messages).table_name- The table name (used for error messages).