dlt.common.destination.dataset
DataAccess Objects
class DataAccess(Protocol)
Common data access protocol shared between dbapi cursors and relations
columns_schema
@property
def columns_schema() -> TTableSchemaColumns
Returns the expected columns schema for the result of the relation. Column types are discovered with sql glot query analysis and lineage. dlt hints for columns are kept in some cases. Refere to <docs-page> for more details.
df
def df(chunk_size: Optional[int] = None) -> Optional[DataFrame]
Fetches the results as arrow table. Uses the native pandas implementation of the destination client cursor if available.
Arguments:
chunk_size
Optional[int] - The number of rows to fetch for this call. Defaults to None which will fetch all rows.
Returns:
Optional[DataFrame]
- A data frame with query results.
arrow
def arrow(chunk_size: Optional[int] = None) -> Optional[ArrowTable]
Fetches the results as arrow table. Uses the native arrow implementation of the destination client cursor if available.
Arguments:
chunk_size
Optional[int] - The number of rows to fetch for this call. Defaults to None which will fetch all rows.
Returns:
Optional[ArrowTable]
- An arrow table with query results.
iter_df
def iter_df(chunk_size: int) -> Generator[DataFrame, None, None]
Iterates over data frames of 'chunk_size' items. Uses the native pandas implementation of the destination client cursor if available.
Arguments:
chunk_size
int - The number of rows to fetch for each iteration.
Returns:
Generator[DataFrame, None, None]: A generator of data frames with query results.
iter_arrow
def iter_arrow(chunk_size: int) -> Generator[ArrowTable, None, None]
Iterates over arrow tables of 'chunk_size' items. Uses the native arrow implementation of the destination client cursor if available.
Arguments:
chunk_size
int - The number of rows to fetch for each iteration.
Returns:
Generator[ArrowTable, None, None]: A generator of arrow tables with query results.
fetchall
def fetchall() -> list[tuple[Any, ...]]
Fetches all items as a list of python tuples. Uses the native dbapi fetchall implementation of the destination client cursor.
Returns:
list[tuple[Any, ...]]: A list of python tuples w
fetchmany
def fetchmany(chunk_size: int) -> list[tuple[Any, ...]]
Fetches the first 'chunk_size' items as a list of python tuples. Uses the native dbapi fetchmany implementation of the destination client cursor.
Arguments:
chunk_size
int - The number of rows to fetch for this call.
Returns:
list[tuple[Any, ...]]: A list of python tuples with query results.
iter_fetch
def iter_fetch(chunk_size: int) -> Generator[list[tuple[Any, ...]], Any, Any]
Iterates in lists of Python tuples in 'chunk_size' chunks. Uses the native dbapi fetchmany implementation of the destination client cursor.
Arguments:
chunk_size
int - The number of rows to fetch for each iteration.
Returns:
Generator[list[tuple[Any, ...]], Any, Any]: A generator of lists of python tuples with query results.
fetchone
def fetchone() -> Optional[tuple[Any, ...]]
Fetches the first item as a python tuple. Uses the native dbapi fetchone implementation of the destination client cursor.
Returns:
Optional[tuple[Any, ...]]: A python tuple with the first item of the query results.
Relation Objects
class Relation(DataAccess, Protocol)
A readable relation retrieved from a destination that supports it
schema
The schema of the relation
scalar
def scalar() -> Any
fetch first value of first column on first row as python primitive
Returns:
Any
- The first value of the first column on the first row as a python primitive.
limit
def limit(limit: int, **kwargs: Any) -> Self
Returns a new relation with the limit applied.
Arguments:
limit
int - The number of rows to fetch.**kwargs
Any - Additional keyword arguments to pass to the limit implementation of the destination client cursor.
Returns:
Self
- The relation with the limit applied.
head
def head(limit: int = 5) -> Self
By default returns a relation with the first 5 rows selected.
Arguments:
limit
int - The number of rows to fetch.
Returns:
Self
- The relation with the limit applied.
select
def select(*columns: str) -> Self
Returns a new relation with the given columns selected.
Arguments:
*columns
str - The columns to select.
Returns:
Self
- The relation with the columns selected.
max
def max() -> Self
Returns a new relation with the MAX aggregate applied. Exactly one column must be selected.
Returns:
Self
- The relation with the MAX aggregate expression.
min
def min() -> Self
Returns a new relation with the MIN aggregate applied. Exactly one column must be selected.
Returns:
Self
- The relation with the MIN aggregate expression.
where
def where(column_or_expr: SqlglotExprOrStr,
operator: Optional[TFilterOperation] = None,
value: Optional[Any] = None) -> Self
Returns a new relation with the given where clause applied. Same as .filter().
Arguments:
column_or_expr
SqlglotExprOrStr - The column to filter on. Alternatively, the SQL expression or string representing a custom WHERE clause.operator
Optional[TFilterOperation] - The operator to use. Available operations are: eq, ne, gt, lt, gte, lte, in, not_invalue
Optional[Any] - The value to filter on.
Returns:
Self
- A copy of the relation with the where clause applied.
filter
def filter(column_name: str, operator: TFilterOperation, value: Any) -> Self
Returns a new relation with the given where clause applied. Same as .where().
Arguments:
column_name
str - The column to filter on.operator
TFilterOperation - The operator to use. Available operations are: eq, ne, gt, lt, gte, lte, in, not_invalue
Any - The value to filter on.
Returns:
Self
- A copy of the relation with the where clause applied.
order_by
def order_by(column_name: str, direction: TSortOrder = "asc") -> Self
Returns a new relation with the given order by clause applied.
Arguments:
column_name
str - The column to order by.direction
TSortOrder, optional - The direction to order by: "asc"/"desc". Defaults to "asc".
Returns:
Self
- A copy of the relation with the order by clause applied.
__getitem__
def __getitem__(columns: Union[str, Sequence[str]]) -> Self
Returns a new relation with the given columns selected.
Arguments:
columns
Union[str, Sequence[str]] - The columns to select.
Returns:
Self
- The relation with the columns selected.
__copy__
def __copy__() -> Self
create a copy of the relation object
Returns:
Self
- The copy of the relation object
DBApiCursorProtocol Objects
class DBApiCursorProtocol(DataAccess, Protocol)
Protocol for the DBAPI cursor
native_cursor
Cursor implementation native to current destination
execute
def execute(query: AnyStr, *args: Any, **kwargs: Any) -> None
Execute a query on the cursor
close
def close() -> None
Close the cursor
DBApiCursor Objects
class DBApiCursor(abc.ABC, DBApiCursorProtocol)
Protocol for the DBAPI cursor
native_cursor
Cursor implementation native to current destination
execute
def execute(query: AnyStr, *args: Any, **kwargs: Any) -> None
Execute a query on the cursor
close
def close() -> None
Close the cursor
Dataset Objects
class Dataset(Protocol)
A readable dataset retrieved from a destination, has support for creating readable relations for a query or table
schema
@property
def schema() -> Schema
Returns the schema of the dataset, will fetch the schema from the destination
Returns:
Schema
- The schema of the dataset
sqlglot_schema
@property
def sqlglot_schema() -> SQLGlotSchema
Returns the computed and cached sqlglot schema of the dataset
Returns:
SQLGlotSchema
- The sqlglot schema of the dataset
dataset_name
@property
def dataset_name() -> str
Returns the name of the dataset
Returns:
str
- The name of the dataset
__call__
def __call__(query: Union[str, sge.Select, IbisExpr],
query_dialect: Optional[TSqlGlotDialect] = None,
_execute_raw_query: bool = False) -> Relation
Returns a readable relation for a given sql query
Arguments:
query
Union[str, sge.Select, IbisExpr] - The sql query to base the relation on. Can be a raw sql query, a sqlglot select expression or an ibis expression.query_dialect
Optional[TSqlGlotDialect] - The dialect of the query. Defaults to the dataset's destination dialect. You can use this to write queries in a different dialect than the destination. This settings will only be user fo the initial parsing of the query. When executing the query, the query will be executed in the underlying destination dialect._execute_raw_query
bool, optional - Whether to run the query as is (raw)or perform query normalization and lineage. Experimental.
Returns:
Relation
- The readable relation for the query
query
def query(query: Union[str, sge.Select, IbisExpr],
query_dialect: Optional[TSqlGlotDialect] = None,
_execute_raw_query: bool = False) -> Relation
Returns a readable relation for a given sql query
Arguments:
query
Union[str, sge.Select, IbisExpr] - The sql query to base the relation on. Can be a raw sql query, a sqlglot select expression or an ibis expression.query_dialect
Optional[TSqlGlotDialect] - The dialect of the query. Defaults to the dataset's destination dialect. You can use this to write queries in a different dialect than the destination. This settings will only be user fo the initial parsing of the query. When executing the query, the query will be executed in the underlying destination dialect._execute_raw_query
bool, optional - Whether to run the query as is (raw)or perform query normalization and lineage. Experimental.
Returns:
Relation
- The readable relation for the query
table
def table(
table_name: str,
table_type: Literal["relation", "ibis"] = "relation"
) -> Union[Relation, IbisTable]
Returns an object representing a table named table_name
Arguments:
table_name
str - The name of the tabletable_type
Literal["relation", "ibis"], optional - The type of the table. Defaults to "relation" if not specified. If "ibis" is specified, you will get an unbound ibis table.
Returns:
Union[Relation, IbisTable]: The object representing the table
__getitem__
def __getitem__(table: str) -> Relation
Returns a readable relation for the table named table
Arguments:
table
str - The name of the table
Returns:
Relation
- The readable relation for the table
__getattr__
def __getattr__(table: str) -> Relation
Returns a readable relation for the table named table
Arguments:
table
str - The name of the table
Returns:
Relation
- The readable relation for the table
__enter__
def __enter__() -> Self
Context manager to keep the connection to the destination open between queries
__exit__
def __exit__(exc_type: Type[BaseException], exc_val: BaseException,
exc_tb: TracebackType) -> None
Context manager to keep the connection to the destination open between queries
ibis
def ibis() -> IbisBackend
Returns a connected ibis backend for the dataset. Not implemented for all destinations.
Returns:
IbisBackend
- The ibis backend for the dataset
row_counts
def row_counts(*,
data_tables: bool = True,
dlt_tables: bool = False,
table_names: Optional[list[str]] = None,
load_id: Optional[str] = None) -> Relation
Returns the row counts of the dataset
Arguments:
data_tables
bool - Whether to include data tables. Defaults to True.dlt_tables
bool - Whether to include dlt tables. Defaults to False.table_names
Optional[list[str]] - The names of the tables to include. Defaults to None. Will override data_tables and dlt_tables if setload_id
Optional[str] - If set, only count rows associated with a given load id. Will exclude tables that do not have a load id.
Returns:
Relation
- The row counts of the dataset as ReadableRelation