Skip to main content
Version: devel

dlt.common.destination.dataset

DataAccess Objects

class DataAccess(Protocol)

View source on GitHub

Common data access protocol shared between dbapi cursors and relations

columns_schema

@property
def columns_schema() -> TTableSchemaColumns

View source on GitHub

Returns the expected columns schema for the result of the relation. Column types are discovered with sql glot query analysis and lineage. dlt hints for columns are kept in some cases. Refere to <docs-page> for more details.

df

def df(chunk_size: Optional[int] = None) -> Optional[DataFrame]

View source on GitHub

Fetches the results as arrow table. Uses the native pandas implementation of the destination client cursor if available.

Arguments:

  • chunk_size Optional[int] - The number of rows to fetch for this call. Defaults to None which will fetch all rows.

Returns:

  • Optional[DataFrame] - A data frame with query results.

arrow

def arrow(chunk_size: Optional[int] = None) -> Optional[ArrowTable]

View source on GitHub

Fetches the results as arrow table. Uses the native arrow implementation of the destination client cursor if available.

Arguments:

  • chunk_size Optional[int] - The number of rows to fetch for this call. Defaults to None which will fetch all rows.

Returns:

  • Optional[ArrowTable] - An arrow table with query results.

iter_df

def iter_df(chunk_size: int) -> Generator[DataFrame, None, None]

View source on GitHub

Iterates over data frames of 'chunk_size' items. Uses the native pandas implementation of the destination client cursor if available.

Arguments:

  • chunk_size int - The number of rows to fetch for each iteration.

Returns:

Generator[DataFrame, None, None]: A generator of data frames with query results.

iter_arrow

def iter_arrow(chunk_size: int) -> Generator[ArrowTable, None, None]

View source on GitHub

Iterates over arrow tables of 'chunk_size' items. Uses the native arrow implementation of the destination client cursor if available.

Arguments:

  • chunk_size int - The number of rows to fetch for each iteration.

Returns:

Generator[ArrowTable, None, None]: A generator of arrow tables with query results.

fetchall

def fetchall() -> list[tuple[Any, ...]]

View source on GitHub

Fetches all items as a list of python tuples. Uses the native dbapi fetchall implementation of the destination client cursor.

Returns:

list[tuple[Any, ...]]: A list of python tuples w

fetchmany

def fetchmany(chunk_size: int) -> list[tuple[Any, ...]]

View source on GitHub

Fetches the first 'chunk_size' items as a list of python tuples. Uses the native dbapi fetchmany implementation of the destination client cursor.

Arguments:

  • chunk_size int - The number of rows to fetch for this call.

Returns:

list[tuple[Any, ...]]: A list of python tuples with query results.

iter_fetch

def iter_fetch(chunk_size: int) -> Generator[list[tuple[Any, ...]], Any, Any]

View source on GitHub

Iterates in lists of Python tuples in 'chunk_size' chunks. Uses the native dbapi fetchmany implementation of the destination client cursor.

Arguments:

  • chunk_size int - The number of rows to fetch for each iteration.

Returns:

Generator[list[tuple[Any, ...]], Any, Any]: A generator of lists of python tuples with query results.

fetchone

def fetchone() -> Optional[tuple[Any, ...]]

View source on GitHub

Fetches the first item as a python tuple. Uses the native dbapi fetchone implementation of the destination client cursor.

Returns:

Optional[tuple[Any, ...]]: A python tuple with the first item of the query results.

Relation Objects

class Relation(DataAccess, Protocol)

View source on GitHub

A readable relation retrieved from a destination that supports it

schema

The schema of the relation

scalar

def scalar() -> Any

View source on GitHub

fetch first value of first column on first row as python primitive

Returns:

  • Any - The first value of the first column on the first row as a python primitive.

limit

def limit(limit: int, **kwargs: Any) -> Self

View source on GitHub

Returns a new relation with the limit applied.

Arguments:

  • limit int - The number of rows to fetch.
  • **kwargs Any - Additional keyword arguments to pass to the limit implementation of the destination client cursor.

Returns:

  • Self - The relation with the limit applied.
def head(limit: int = 5) -> Self

View source on GitHub

By default returns a relation with the first 5 rows selected.

Arguments:

  • limit int - The number of rows to fetch.

Returns:

  • Self - The relation with the limit applied.

select

def select(*columns: str) -> Self

View source on GitHub

Returns a new relation with the given columns selected.

Arguments:

  • *columns str - The columns to select.

Returns:

  • Self - The relation with the columns selected.

max

def max() -> Self

View source on GitHub

Returns a new relation with the MAX aggregate applied. Exactly one column must be selected.

Returns:

  • Self - The relation with the MAX aggregate expression.

min

def min() -> Self

View source on GitHub

Returns a new relation with the MIN aggregate applied. Exactly one column must be selected.

Returns:

  • Self - The relation with the MIN aggregate expression.

where

def where(column_or_expr: SqlglotExprOrStr,
operator: Optional[TFilterOperation] = None,
value: Optional[Any] = None) -> Self

View source on GitHub

Returns a new relation with the given where clause applied. Same as .filter().

Arguments:

  • column_or_expr SqlglotExprOrStr - The column to filter on. Alternatively, the SQL expression or string representing a custom WHERE clause.
  • operator Optional[TFilterOperation] - The operator to use. Available operations are: eq, ne, gt, lt, gte, lte, in, not_in
  • value Optional[Any] - The value to filter on.

Returns:

  • Self - A copy of the relation with the where clause applied.

filter

def filter(column_name: str, operator: TFilterOperation, value: Any) -> Self

View source on GitHub

Returns a new relation with the given where clause applied. Same as .where().

Arguments:

  • column_name str - The column to filter on.
  • operator TFilterOperation - The operator to use. Available operations are: eq, ne, gt, lt, gte, lte, in, not_in
  • value Any - The value to filter on.

Returns:

  • Self - A copy of the relation with the where clause applied.

order_by

def order_by(column_name: str, direction: TSortOrder = "asc") -> Self

View source on GitHub

Returns a new relation with the given order by clause applied.

Arguments:

  • column_name str - The column to order by.
  • direction TSortOrder, optional - The direction to order by: "asc"/"desc". Defaults to "asc".

Returns:

  • Self - A copy of the relation with the order by clause applied.

__getitem__

def __getitem__(columns: Union[str, Sequence[str]]) -> Self

View source on GitHub

Returns a new relation with the given columns selected.

Arguments:

  • columns Union[str, Sequence[str]] - The columns to select.

Returns:

  • Self - The relation with the columns selected.

__copy__

def __copy__() -> Self

View source on GitHub

create a copy of the relation object

Returns:

  • Self - The copy of the relation object

DBApiCursorProtocol Objects

class DBApiCursorProtocol(DataAccess, Protocol)

View source on GitHub

Protocol for the DBAPI cursor

native_cursor

Cursor implementation native to current destination

execute

def execute(query: AnyStr, *args: Any, **kwargs: Any) -> None

View source on GitHub

Execute a query on the cursor

close

def close() -> None

View source on GitHub

Close the cursor

DBApiCursor Objects

class DBApiCursor(abc.ABC, DBApiCursorProtocol)

View source on GitHub

Protocol for the DBAPI cursor

native_cursor

Cursor implementation native to current destination

execute

def execute(query: AnyStr, *args: Any, **kwargs: Any) -> None

View source on GitHub

Execute a query on the cursor

close

def close() -> None

View source on GitHub

Close the cursor

Dataset Objects

class Dataset(Protocol)

View source on GitHub

A readable dataset retrieved from a destination, has support for creating readable relations for a query or table

schema

@property
def schema() -> Schema

View source on GitHub

Returns the schema of the dataset, will fetch the schema from the destination

Returns:

  • Schema - The schema of the dataset

sqlglot_schema

@property
def sqlglot_schema() -> SQLGlotSchema

View source on GitHub

Returns the computed and cached sqlglot schema of the dataset

Returns:

  • SQLGlotSchema - The sqlglot schema of the dataset

dataset_name

@property
def dataset_name() -> str

View source on GitHub

Returns the name of the dataset

Returns:

  • str - The name of the dataset

__call__

def __call__(query: Union[str, sge.Select, IbisExpr],
query_dialect: Optional[TSqlGlotDialect] = None,
_execute_raw_query: bool = False) -> Relation

View source on GitHub

Returns a readable relation for a given sql query

Arguments:

  • query Union[str, sge.Select, IbisExpr] - The sql query to base the relation on. Can be a raw sql query, a sqlglot select expression or an ibis expression.
  • query_dialect Optional[TSqlGlotDialect] - The dialect of the query. Defaults to the dataset's destination dialect. You can use this to write queries in a different dialect than the destination. This settings will only be user fo the initial parsing of the query. When executing the query, the query will be executed in the underlying destination dialect.
  • _execute_raw_query bool, optional - Whether to run the query as is (raw)or perform query normalization and lineage. Experimental.

Returns:

  • Relation - The readable relation for the query

query

def query(query: Union[str, sge.Select, IbisExpr],
query_dialect: Optional[TSqlGlotDialect] = None,
_execute_raw_query: bool = False) -> Relation

View source on GitHub

Returns a readable relation for a given sql query

Arguments:

  • query Union[str, sge.Select, IbisExpr] - The sql query to base the relation on. Can be a raw sql query, a sqlglot select expression or an ibis expression.
  • query_dialect Optional[TSqlGlotDialect] - The dialect of the query. Defaults to the dataset's destination dialect. You can use this to write queries in a different dialect than the destination. This settings will only be user fo the initial parsing of the query. When executing the query, the query will be executed in the underlying destination dialect.
  • _execute_raw_query bool, optional - Whether to run the query as is (raw)or perform query normalization and lineage. Experimental.

Returns:

  • Relation - The readable relation for the query

table

def table(
table_name: str,
table_type: Literal["relation", "ibis"] = "relation"
) -> Union[Relation, IbisTable]

View source on GitHub

Returns an object representing a table named table_name

Arguments:

  • table_name str - The name of the table
  • table_type Literal["relation", "ibis"], optional - The type of the table. Defaults to "relation" if not specified. If "ibis" is specified, you will get an unbound ibis table.

Returns:

Union[Relation, IbisTable]: The object representing the table

__getitem__

def __getitem__(table: str) -> Relation

View source on GitHub

Returns a readable relation for the table named table

Arguments:

  • table str - The name of the table

Returns:

  • Relation - The readable relation for the table

__getattr__

def __getattr__(table: str) -> Relation

View source on GitHub

Returns a readable relation for the table named table

Arguments:

  • table str - The name of the table

Returns:

  • Relation - The readable relation for the table

__enter__

def __enter__() -> Self

View source on GitHub

Context manager to keep the connection to the destination open between queries

__exit__

def __exit__(exc_type: Type[BaseException], exc_val: BaseException,
exc_tb: TracebackType) -> None

View source on GitHub

Context manager to keep the connection to the destination open between queries

ibis

def ibis() -> IbisBackend

View source on GitHub

Returns a connected ibis backend for the dataset. Not implemented for all destinations.

Returns:

  • IbisBackend - The ibis backend for the dataset

row_counts

def row_counts(*,
data_tables: bool = True,
dlt_tables: bool = False,
table_names: Optional[list[str]] = None,
load_id: Optional[str] = None) -> Relation

View source on GitHub

Returns the row counts of the dataset

Arguments:

  • data_tables bool - Whether to include data tables. Defaults to True.
  • dlt_tables bool - Whether to include dlt tables. Defaults to False.
  • table_names Optional[list[str]] - The names of the tables to include. Defaults to None. Will override data_tables and dlt_tables if set
  • load_id Optional[str] - If set, only count rows associated with a given load id. Will exclude tables that do not have a load id.

Returns:

  • Relation - The row counts of the dataset as ReadableRelation

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.