Skip to main content
Version: devel

dlt.dataset.dataset

Dataset Objects

class Dataset()

View source on GitHub

Access to dataframes and arrow tables in the destination dataset via dbapi

ibis

def ibis(read_only: bool = False) -> IbisBackend

View source on GitHub

Get an ibis backend for the dataset.

This creates a connection to the destination.

The read_only flag is currently only supported for duckdb destination.

schema

@property
def schema() -> dlt.Schema

View source on GitHub

dlt schema associated with the dataset.

If no provided at dataset initialization, it is fetched from the destination. Fallbacks to local dlt pipeline metadata.

tables

@property
def tables() -> list[str]

View source on GitHub

List of table names found in the dataset.

This only includes "completed tables". In other words, during the lifetime of a pipeline.run() execution, tables may exist on the destination, but will only appear on the dataset once pipeline.run() is done.

_ipython_key_completions_

def _ipython_key_completions_() -> list[str]

View source on GitHub

Provide table names as completion suggestion in interactive environments.

sqlglot_schema

@property
def sqlglot_schema() -> SQLGlotSchema

View source on GitHub

SQLGlot schema of the dataset derived from the dlt schema.

destination_dialect

@property
def destination_dialect() -> TSqlGlotDialect

View source on GitHub

SQLGlot dialect of the dataset destination.

This is the target dialect when transpiling SQL queries.

dataset_name

@property
def dataset_name() -> str

View source on GitHub

Name of the dataset

is_same_physical_destination

def is_same_physical_destination(other: dlt.Dataset) -> bool

View source on GitHub

Returns true if the other dataset is on the same physical destination helpful if we want to run sql queries without extracting the data

query

def query(query: Union[str, sge.Select, ir.Expr],
query_dialect: Optional[TSqlGlotDialect] = None,
*,
_execute_raw_query: bool = False) -> dlt.Relation

View source on GitHub

Create a dlt.Relation from an SQL query, SQLGlot expression or Ibis expression.

Arguments:

  • query Union[str, sge.Select, ir.Expr] - The query that defines the relation.
  • query_dialect Optional[TSqlGlotDialect] - The dialect of the query. If specified, it will be used to transpile the query to the destination's dialect. Otherwise, the query is assumed to be the destination's dialect (accessible via Dataset.sqlglot_dialect)

Returns:

  • dlt.Relation - The relation for the query

__call__

def __call__(query: Union[str, sge.Select, ir.Expr],
query_dialect: Optional[TSqlGlotDialect] = None,
*,
_execute_raw_query: bool = False) -> dlt.Relation

View source on GitHub

Convenience method to proxy Dataset.query(). See this method for details.

table

def table(
table_name: str,
table_type: Literal["relation", "ibis"] = "relation"
) -> Union[dlt.Relation, ir.Table]

View source on GitHub

Get a dlt.Relation associated with a table from the dataset.

row_counts

def row_counts(*,
data_tables: bool = True,
dlt_tables: bool = False,
table_names: Optional[list[str]] = None,
load_id: Optional[str] = None) -> dlt.Relation

View source on GitHub

Create a dlt.Relation with the query to get the row counts of all tables in the dataset.

Arguments:

  • data_tables bool - Whether to include data tables. Defaults to True.
  • dlt_tables bool - Whether to include dlt tables. Defaults to False.
  • table_names Optional[list[str]] - The names of the tables to include. Defaults to None. Will override data_tables and dlt_tables if set
  • load_id Optional[str] - If set, only count rows associated with a given load id. Will exclude tables that do not have a load id.

Returns:

  • dlt.Relation - Relation for the query that computes the requested row count.

__getitem__

def __getitem__(table_name: str) -> dlt.Relation

View source on GitHub

Get a dlt.Relation for a table via dictionary notation.

This proxies Dataset.table().

__getattr__

def __getattr__(name: str) -> Any

View source on GitHub

Get a dlt.Relation for a table via dictionary notation.

This proxies Dataset.table().

__enter__

def __enter__() -> Self

View source on GitHub

Context manager to keep the connection to the destination open between queries

__exit__

def __exit__(exc_type: Type[BaseException], exc_val: BaseException,
exc_tb: TracebackType) -> None

View source on GitHub

Context manager to keep the connection to the destination open between queries

is_same_physical_destination

def is_same_physical_destination(dataset1: dlt.Dataset,
dataset2: dlt.Dataset) -> bool

View source on GitHub

Check if both datasets are at the same physical destination.

This is done by comparing the fingerprint of both destination configs. There are potential false positive if two different config give access to the same destination.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.