Skip to main content
Version: 1.5.0 (latest)

pipeline.dbt

get_venv

def get_venv(pipeline: Pipeline,
venv_path: str = "dbt",
dbt_version: str = _DEFAULT_DBT_VERSION) -> Venv

[view_source]

Creates or restores a virtual environment in which the dbt packages are executed.

The recommended way to execute dbt package is to use a separate virtual environment where only the dbt-core and required destination dependencies are installed. This avoid dependency clashes with the user-installed libraries. This method will create such environment at the location specified in venv_path and automatically install required dependencies as required by pipeline.

Arguments:

  • pipeline Pipeline - A pipeline for which the required dbt dependencies are inferred
  • venv_path str, optional - A path where virtual environment is created or restored from. If relative path is provided, the environment will be created within pipeline's working directory. Defaults to "dbt".
  • dbt_version str, optional - Version of dbt to be used. Exact version (ie. "1.2.4") or pip requirements string (ie. ">=1.1<1.5" may be provided).

Returns:

  • Venv - A Virtual Environment with dbt dependencies installed

package

def package(pipeline: Pipeline,
package_location: str,
package_repository_branch: str = ConfigValue,
package_repository_ssh_key: TSecretStrValue = "",
auto_full_refresh_when_out_of_sync: bool = ConfigValue,
venv: Venv = None) -> DBTPackageRunner

[view_source]

Creates a Python wrapper over dbt package present at specified location, that allows to control it (ie. run and test) from Python code.

The created wrapper minimizes the required effort to run dbt packages on datasets created with dlt. It clones the package repo and keeps it up to data, shares the dlt destination credentials with dbt and allows the isolated execution with venv parameter. The wrapper creates a dbt profile from dlt pipeline configuration. Specifically:

  1. destination is used to infer correct dbt profile
  2. destinations credentials are passed to dbt via environment variables
  3. dataset_name is used to configure the dbt database schema

Arguments:

  • pipeline Pipeline - A pipeline containing destination, credentials and dataset_name used to configure the dbt package.
  • package_location str - A git repository url to be cloned or a local path where dbt package is present
  • package_repository_branch str, optional - A branch name, tag name or commit-id to check out. Defaults to None.
  • package_repository_ssh_key TSecretValue, optional - SSH key to be used to clone private repositories. Defaults to TSecretValue("").
  • auto_full_refresh_when_out_of_sync bool, optional - If set to True (default), the wrapper will automatically fall back to full-refresh mode when schema is out of sync
  • See - https://docs.getdbt.com/docs/build/incremental-models#what-if-the-columns-of-my-incremental-model-change
  • venv Venv, optional - A virtual environment with required dbt dependencies. Defaults to None which will execute dbt package in current environment.

Returns:

  • DBTPackageRunner - A configured and authenticated Python dbt wrapper

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.