Skip to main content
Version: devel

dlt.common.libs.sqlglot

from_sqlglot_type

def from_sqlglot_type(sqlglot_type: DATA_TYPE) -> TColumnType

View source on GitHub

Convert a SQLGlot DataType to dlt column hints.

reference: https://dlthub.com/docs/general-usage/schema/#tables-and-columns

to_sqlglot_type

def to_sqlglot_type(dlt_type: TDataType,
precision: Optional[int] = None,
scale: Optional[int] = None,
timezone: Optional[bool] = None,
nullable: Optional[bool] = None,
use_named_types: bool = False) -> DataType

View source on GitHub

Convert the dlt data_type and column hints to a SQLGlot DataType expression.

if use_named_type is True: use "named" types with fallback on "parameterized" types. else: use "parameterized" types everywhere.

Named types:

  • have some set attributes, e.g., DataType.Type.DECIMAL64 has precision=16 and scale=4.
  • can be referenced directly in SQL (table definition, CAST(), which could make SQLGlot transpiling more reliable.
  • may not exist in all dialects and will be casted automatically during transpiling
  • can have limited expressivity, e.g., named types for timestamps only included precision 0, 3, 9

Parameterized types:

  • instead of DECIMAL64, a generic DataType.Type.DECIMAL is used with DataTypeParam expressions attached to represent precision and scale. SQLGlot is responsible for properly compiling and transpiling them.
  • parameters might be handled differently across dialects and would require greater testing on the dlt side.

reference: https://dlthub.com/docs/general-usage/schema/#tables-and-columns

set_metadata

def set_metadata(sqlglot_type: sge.DataType,
metadata: TColumnSchema) -> sge.DataType

View source on GitHub

Set a metadata dictionary on the SQGLot DataType object.

By attaching dlt hints to a DataType object, they will be propagated until the DataType is modified.

get_metadata

def get_metadata(sqlglot_type: sge.DataType) -> TColumnSchema

View source on GitHub

Get a metadata dictionary from the SQLGlot DataType object.

query_is_complex

def query_is_complex(parsed_select: Union[sge.Select, sge.Union],
columns: Set[str]) -> bool

View source on GitHub

Return True unless the query is provably “simple”. A simple query

  1. references exactly one physical table,
  2. contains no complex constructs (CTEs, sub-queries, derived tables, unions, window functions, GROUP BY, DISTINCT, etc.),
  3. projects either • a plain/qualified star with only constant literals after it, or • the full, explicit list of all columns with only constant literals after it. Anything we cannot prove to be simple is conservatively flagged as complex.

build_typed_literal

def build_typed_literal(
value: Any,
sqlglot_type: sge.DataType = None) -> Union[sge.Expression, sge.Tuple]

View source on GitHub

Create a literal and CAST it to the requested sqlglot DataType.

SqlModel Objects

class SqlModel()

View source on GitHub

A SqlModel is a named tuple that contains a query and a dialect. It is used to represent a SQL query and the dialect to use for parsing it.

from_query_string

@classmethod
def from_query_string(cls,
query: str,
dialect: Optional[str] = None) -> "SqlModel"

View source on GitHub

Creates a SqlModel from a raw SQL query string using sqlglot. Ensures that the parsed query is an instance of sqlglot.exp.Select.

Arguments:

  • query - The raw SQL query string.
  • dialect - The SQL dialect to use for parsing.

Returns:

An instance of SqlModel with the normalized query and dialect.

Raises:

  • ValueError - If the parsed query is not an instance of sqlglot.exp.Select.

uuid_expr_for_dialect

def uuid_expr_for_dialect(dialect: TSqlGlotDialect,
load_id: str) -> sge.Expression

View source on GitHub

Generates a UUID expression based on the specified dialect.

Arguments:

  • dialect - The SQL dialect for which the UUID expression needs to be generated.
  • load_id - The load ID used for deterministic UUID generation (redshift).

Returns:

A SQL expression that generates a UUID for the specified dialect.

get_select_column_names

def get_select_column_names(
selects: List[sge.Expression],
dialect: Optional[TSqlGlotDialect] = None) -> List[str]

View source on GitHub

Extract output column names from a SELECT clause's expression list.

Handles Alias (returns alias), Column and Dot (returns output_name with fallback to name — needed for BigQuery quoted identifiers that parse as Dot). Raises ValueError for star expressions or unsupported expression types.

Arguments:

  • selects - The .selects list from a parsed SELECT statement.
  • dialect - SQL dialect used only for error message formatting.

Returns:

Column output names in SELECT order.

filter_select_column_names

def filter_select_column_names(
selects: List[sge.Expression],
discard_columns: Set[str],
normalize_fn: Callable[[str], str],
dialect: Optional[TSqlGlotDialect] = None) -> List[str]

View source on GitHub

Return SELECT column names excluding those whose normalized form is in discard_columns.

Preserves original SELECT order.

Arguments:

  • selects - The .selects list from a parsed SELECT statement.
  • discard_columns - Set of normalized column names to exclude.
  • normalize_fn - Callable that normalizes a column name (e.g. casefold).
  • dialect - SQL dialect used only for error message formatting.

Returns:

Remaining column names in their original SELECT order.

validate_no_star_select

def validate_no_star_select(parsed_select: sge.Select,
dialect: TSqlGlotDialect) -> None

View source on GitHub

Raises ValueError if the SELECT statement contains a star expression.

Arguments:

  • parsed_select - The parsed SELECT statement.
  • dialect - The SQL dialect (used for error message formatting).

build_outer_select_statement

def build_outer_select_statement(
select_dialect: TSqlGlotDialect, parsed_select: sge.Select,
columns: TTableSchemaColumns,
normalize_casefold_fn: Callable[[str],
str]) -> Tuple[sge.Select, bool]

View source on GitHub

Wraps the parsed SELECT in a subquery and builds an outer SELECT statement.

Arguments:

  • select_dialect - The SQL dialect to use for parsing and formatting.
  • parsed_select - The parsed SELECT statement.
  • columns - The schema columns to match.
  • normalize_casefold_fn - A callable that normalizes and casefolds an identifier.

Returns:

Tuple of the outer SELECT statement and a flag indicating if reordering is needed.

reorder_or_adjust_outer_select

def reorder_or_adjust_outer_select(outer_parsed_select: sge.Select,
columns: TTableSchemaColumns,
normalize_casefold_fn: Callable[[str], str],
schema_name: str, table_name: str) -> None

View source on GitHub

Reorders or adjusts the SELECT statement to match the schema.

Adds missing columns as NULL and removes extra columns not in the schema.

Arguments:

  • outer_parsed_select - The parsed outer SELECT statement.
  • columns - The schema columns to match.
  • normalize_casefold_fn - A callable that normalizes and casefolds an identifier.
  • schema_name - The schema name (used for error messages).
  • table_name - The table name (used for error messages).

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.