Access to configuration in code
Access to configuration in dlt decorated functions
dlt
automatically generates configuration specs for functions decorated with @dlt.source
, @dlt.resource
, and @dlt.destination
, without additional code needed. You can configure these functions using any of the standard configuration methods including environment variables and TOML files. You can call them like regular Python functions - dlt injects configuration values for any argument you don't explicitly provide.
Injection rules
- Arguments passed explicitly are never injected. This makes the injection mechanism optional. Example with the Pipedrive source:
@dlt.source(name="pipedrive")
def pipedrive_source(
pipedrive_api_key: str = dlt.secrets.value,
since_timestamp: Optional[Union[pendulum.DateTime, str]] = "1970-01-01 00:00:00",
) -> Iterator[DltResource]:
...
my_key = os.environ["MY_PIPEDRIVE_KEY"]
my_source = pipedrive_source(pipedrive_api_key=my_key)
You can specify pipedrive_api_key
explicitly if you prefer not to use the standard options for credential handling.
- Required arguments (without default values) are never injected and must be specified explicitly when calling. Example:
@dlt.source
def slack_data(channels_list: List[str], api_key: str = dlt.secrets.value):
...
The channels_list
argument won't be injected and will produce an error if not specified explicitly.
- Arguments with default values are injected if found in config providers. Otherwise, the default values from the function signature are used. Example:
@dlt.source
def slack_source(
page_size: int = MAX_PAGE_SIZE,
access_token: str = dlt.secrets.value,
start_date: Optional[TAnyDateTime] = START_DATE
):
...
dlt
first searches for page_size
, access_token
, and start_date
in config providers in a specific order. If these values aren't found, it falls back to the default values.
- Arguments with special defaults
dlt.secrets.value
anddlt.config.value
must be injected (or explicitly passed). If not found in config providers,dlt
raises an exception.
Additionally, dlt.secrets.value
indicates to dlt
that the value is a secret, meaning it will only be injected from secure config providers.
Add typing to your sources and resources
We recommend adding type annotations to your function signatures. This requires minimal effort and provides several important benefits:
- You won't receive invalid data types in your code.
dlt
automatically parses and converts types for you, eliminating the need for manual parsing.dlt
can generate sample config and secret files for your source automatically.- You can request built-in and custom credentials (connection strings, AWS/GCP/Azure credentials).
- You can specify multiple possible types via
Union
, such as OAuth or API Key authorization.
Example:
@dlt.source
def google_sheets(
spreadsheet_id: str = dlt.config.value,
tab_names: List[str] = dlt.config.value,
credentials: GcpServiceAccountCredentials = dlt.secrets.value,
only_strings: bool = False
):
...
Benefits:
- You'll receive a properly typed list of strings as
tab_names
. - You'll receive properly configured Google credentials (see GCP Credential Configuration), which users can provide in different forms:
service.json
as a string or dictionary (in code or via config providers)- Connection string (used in SQL Alchemy)
- Default credentials if nothing is passed (such as those available on Cloud Function runners)
Organize configuration and secrets with sections
dlt
organizes configuration and secrets sections in a configuration layout that integrates with the injection mechanism. This structure applies to all configuration providers, including TOML files, environment variables, and other sources.
This hierarchical structure efficiently handles simple cases while supporting more complex scenarios, such as multiple sources with different credentials or multiple pipelines sharing configuration in the same project.
pipeline_name
|
|-sources
|-<source 1 module name>
|-<source function 1 name>
|- {all source and resource options and secrets}
|-<source function 2 name>
|- {all source and resource options and secrets}
|-<source 2 module>
|...
|-extract
|- extract options for resources i.e., parallelism settings, maybe retries
|-destination
|- <destination name>
|- {destination options}
|-credentials
|-{credentials options}
|-schema
|-<schema name>
|-schema settings: not implemented but I'll let people set nesting level, name convention, normalizer, etc. here
|-load
|-normalize
When using TOML files, this structure is represented as nested sections with dotted keys. For environment variables and other config providers, the layout is flattened using double underscores (e.g., PIPELINE_NAME__SOURCES__MODULE_NAME__FUNCTION_NAME__OPTION
).
Access configs and secrets in code
While dlt
handles credentials automatically, you can also access them directly in your code. The dlt.secrets
and dlt.config
objects provide dictionary-like access to configuration values and secrets, enabling custom preprocessing if required. You can also store custom settings in the same configuration files.
# Use `dlt.secrets` and `dlt.config` to explicitly retrieve values from providers
source_instance = google_sheets(
dlt.config["sheet_id"],
dlt.config["my_section.tabs"],
dlt.secrets["my_section.gcp_credentials"]
)
source_instance.run(destination="bigquery")
dlt.config
and dlt.secrets
function as dictionaries. dlt
examines all config providers - environment variables, TOML files, etc. - to populate these dictionaries. You can also use dlt.config.get()
or dlt.secrets.get()
to retrieve a value and convert it to a specific type:
credentials = dlt.secrets.get("my_section.gcp_credentials", GcpServiceAccountCredentials)
This creates a GcpServiceAccountCredentials
instance from the values stored under the my_section.gcp_credentials
key.
Write configs and secrets in code
You can also set values programmatically using dlt.config
and dlt.secrets
:
dlt.config["sheet_id"] = "23029402349032049"
dlt.secrets["destination.postgres.credentials"] = BaseHook.get_connection('postgres_dsn').extra
This effectively mocks the TOML provider with your specified values.
Configure destination credentials in code
You can programmatically set destination credentials when needed. This example demonstrates how to use GcpServiceAccountCredentials spec with a BigQuery destination:
import os
import dlt
from dlt.sources.credentials import GcpServiceAccountCredentials
from dlt.destinations import bigquery
# Retrieve credentials from environment variable
creds_dict = os.getenv('BIGQUERY_CREDENTIALS')
# Create and initialize credentials instance
gcp_credentials = GcpServiceAccountCredentials()
gcp_credentials.parse_native_representation(creds_dict)
# Pass credentials to the BigQuery destination
pipeline = dlt.pipeline(destination=bigquery(credentials=gcp_credentials))
pipeline.run([{"key1": "value1"}], table_name="temp")