Skip to main content
Version: 1.0.0 (latest)

How to set up credentials

dlt automatically extracts configuration settings and secrets based on flexible naming conventions.

It then injects these values where needed in functions decorated with @dlt.source, @dlt.resource, or @dlt.destination.

note
  • Configuration refers to non-sensitive settings that define a data pipeline's behavior. These include file paths, database hosts, timeouts, API URLs, and performance settings.
  • Secrets are sensitive data like passwords, API keys, and private keys. They should never be hard-coded to avoid security risks.

Available config providers

There are multiple ways to define configurations and credentials for your pipelines. dlt looks for these definitions in the following order during pipeline execution:

  1. Environment Variables: If a value for a specific argument is found in an environment variable, dlt will use it and will not proceed to search in lower-priority providers.

  2. Vaults: Credentials specified in vaults like Google Secrets Manager, Azure Key Vault, AWS Secrets Manager.

  3. secrets.toml and config.toml files: These files are used for storing both configuration values and secrets. secrets.toml is dedicated to sensitive information, while config.toml contains non-sensitive configuration data.

  4. Custom Providers added with register_provider: This is a custom provider implementation you can design yourself. A custom config provider is helpful if you want to use your own configuration file structure or perform advanced preprocessing of configs and secrets.

  5. Default Argument Values: These are the values specified in the function's signature.

tip

Please make sure your pipeline name contains no whitespace or any other punctuation characters except "-" and "_". This way you will ensure your code is working with any configuration option.

Naming convention

dlt uses a specific naming hierarchy to search for the secrets and configs values. This makes configurations and secrets easy to manage.

To keep the naming convention flexible, dlt looks for a lot of possible combinations of key names, starting from the most specific possible path. Then, if the value is not found, it removes the right-most section and tries again.

The most specific possible path for sources looks like:

[<pipeline_name>.sources.<source_module_name>.<source_function_name>]
<argument_name>="some_value"

The most specific possible path for destinations looks like:

[<pipeline_name>.destination.<destination name>.credentials]
<credential_option>="some_value"

Example

For example, if the source module is named pipedrive and the source is defined as follows:

# pipedrive.py

@dlt.source
def deals(api_key: str = dlt.secrets.value):
pass

dlt will search for the following names in this order:

  1. sources.pipedrive.deals.api_key
  2. sources.pipedrive.api_key
  3. sources.api_key
  4. api_key
tip

You can use your pipeline name to have separate configurations for each pipeline in your project. All config values will be looked with the pipeline name first and then again without it.

[pipeline_name_1.sources.google_sheets.credentials]
client_email = "<client_email_1>"
private_key = "<private_key_1>"
project_id = "<project_id_1>"

[pipeline_name_2.sources.google_sheets.credentials]
client_email = "<client_email_2>"
private_key = "<private_key_2>"
project_id = "<project_id_2>"

Credential types

In most cases, credentials are just key-value pairs, but in some cases, the actual structure of credentials could be quite complex and support several ways of setting it up. For example, to connect to a sql_database source, you can either set up a connection string:

[sources.sql_database]
credentials="snowflake://user:password@service-account/database?warehouse=warehouse_name&role=role"

or set up all parameters of connection separately:

[sources.sql_database.credentials]
drivername="snowflake"
username="user"
password="password"
database = "database"
host = "service-account"
warehouse = "warehouse_name"
role = "role"

dlt can work with both ways and convert one to another. To learn more about which credential types are supported, visit the complex credential types page.

Environment variables

dlt prioritizes security by looking in environment variables before looking into the .toml files.

The format of lookup keys is slightly different from secrets files because for environment variables, all names are capitalized, and sections are separated with a double underscore "__". For example, to specify the Facebook Ads access token through environment variables, you would need to set up:

export SOURCES__FACEBOOK_ADS__ACCESS_TOKEN="<access_token>"

Check out the example of setting up credentials through environment variables.

tip

To organize development and securely manage environment variables for credentials storage, you can use the python-dotenv to automatically load variables from an .env file.

Vaults

Vault integration methods vary based on the vault type. Check out our example involving Google Cloud Secrets Manager. For other vault integrations, you are welcome to contact sales to learn about our building blocks for data platform teams.

secrets.toml and config.toml

The TOML config provider in dlt utilizes two TOML files:

config.toml:

  • Configs refer to non-sensitive configuration data. These are settings, parameters, or options that define the behavior of a data pipeline.
  • They can include things like file paths, database hosts and timeouts, API URLs, performance settings, or any other settings that affect the pipeline's behavior.
  • Accessible in code through dlt.config.values

secrets.toml:

  • Secrets are sensitive information that should be kept confidential, such as passwords, API keys, private keys, and other confidential data.
  • It's crucial to never hard-code secrets directly into the code, as it can pose a security risk.
  • Accessible in code through dlt.secrets.values

By default, the .gitignore file in the project prevents secrets.toml from being added to version control and pushed. However, config.toml can be freely added to version control.

Location

The TOML provider always loads those files from the .dlt folder, located relative to the current working directory.

For example, if your working directory is my_dlt_project and your project has the following structure:

my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py

and you run

python pipelines/google_sheets.py

then dlt will look for secrets in my_dlt_project/.dlt/secrets.toml and ignore the existing my_dlt_project/pipelines/.dlt/secrets.toml.

If you change your working directory to pipelines and run

python google_sheets.py

dlt will look for my_dlt_project/pipelines/.dlt/secrets.toml as (probably) expected.

caution

The TOML provider also has the capability to read files from ~/.dlt/ (located in the user's home directory) in addition to the local project-specific .dlt folder.

Structure

dlt organizes sections in TOML files in a specific structure required by the injection mechanism. Understanding this structure gives you more flexibility in setting credentials. For more details, see Toml files structure.

Custom Providers

You can use the CustomLoaderDocProvider classes to supply a custom dictionary to dlt for use as a supplier of config and secret values. The code below demonstrates how to use a config stored in config.json.

import dlt

from dlt.common.configuration.providers import CustomLoaderDocProvider

# create a function that loads a dict
def load_config():
with open("config.json", "rb") as f:
config_dict = json.load(f)

# create the custom provider
provider = CustomLoaderDocProvider("my_json_provider",load_config)

# register provider
dlt.config.register_provider(provider)
tip

Check our an example for a yaml based config provider that supports switchable profiles.

Examples

Setup both configurations and secrets

dlt recognizes two types of data: secrets and configurations. The main difference is that secrets contain sensitive information, while configurations hold non-sensitive information and can be safely added to version control systems like git. This means you have more flexibility with configurations. You can set up configurations directly in the code, but it is strongly advised not to do this with secrets.

caution

You can put all configurations and credentials in the secrets.toml if it's more convenient. However, credentials cannot be placed in configs.toml because dlt doesn't look for them there.

Let's assume we have a notion source and filesystem destination:

# we can set up a lot in config.toml
# config.toml
[runtime]
log_level="INFO"

[destination.filesystem]
bucket_url = "s3://[your_bucket_name]"

[normalize.data_writer]
disable_compression=true

# but credentials should go to secrets.toml!
# secrets.toml
[source.notion]
api_key = "api_key"

[destination.filesystem.credentials]
aws_access_key_id = "ABCDEFGHIJKLMNOPQRST" # copy the access key here
aws_secret_access_key = "1234567890_access_key" # copy the secret access key here

Google credentials for both source and destination

Let's assume we use the bigquery destination and the google_sheets source. They both use Google credentials and expect them to be configured under the credentials key.

  1. If we create just a single credentials section like in here, the destination and source will share the same credentials.
[credentials]
client_email = "<client_email_both_for_destination_and_source>"
private_key = "<private_key_both_for_destination_and_source>"
project_id = "<project_id_both_for_destination_and_source>"
  1. If we define sections as below, we'll keep the credentials separate
# google sheet credentials
[sources.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"

# bigquery credentials
[destination.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"

Now dlt looks for destination credentials in the following order:

destination.bigquery.credentials --> Not found
destination.credentials --> Found

When looking for the source credentials:

sources.google_sheets_module.google_sheets_function.credentials --> Not found
sources.google_sheets_function.credentials --> Not found
sources.credentials --> Found

Credentials for several different sources and destinations

Let's assume we have several different Google sources and destinations. We can use full paths to organize the secrets.toml file:

# google sheet credentials
[sources.google_sheets.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"

# google analytics credentials
[sources.google_analytics.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"

# bigquery credentials
[destination.bigquery.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"

Credentials for several sources of the same type

Let's assume we have several sources of the same type, how can we separate them in the secrets.toml? The recommended solution is to use different pipeline names for each source:

[pipeline_name_1.sources.sql_database]
credentials="snowflake://user1:password1@service-account/database1?warehouse=warehouse_name&role=role1"

[pipeline_name_2.sources.sql_database]
credentials="snowflake://user2:password2@service-account/database2?warehouse=warehouse_name&role=role2"

Understanding the exceptions

If dlt expects configuration of secrets value but cannot find it, it will output the ConfigFieldMissingException.

Let's run the chess.py example without providing the password:

$ CREDENTIALS="postgres://loader@localhost:5432/dlt_data" python chess.py
...
dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['password'] in configuration with spec PostgresCredentials
for field "password" config providers and keys were tried in the following order:
In Environment Variables key CHESS_GAMES__DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__CREDENTIALS__PASSWORD was not found.
In secrets.toml key chess_games.destination.postgres.credentials.password was not found.
In secrets.toml key chess_games.destination.credentials.password was not found.
In secrets.toml key chess_games.credentials.password was not found.
In Environment Variables key DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CREDENTIALS__PASSWORD was not found.
In secrets.toml key destination.postgres.credentials.password was not found.
In secrets.toml key destination.credentials.password was not found.
In secrets.toml key credentials.password was not found.
Please refer to https://dlthub.com/docs/general-usage/credentials for more information

It tells you exactly which paths dlt looked at, via which config providers and in which order.

In the example above:

  1. First, dlt looked in a big section chess_games, which is the name of the pipeline.
  2. In each case, it starts with full paths and goes to the minimum path credentials.password.
  3. First, it looks into environment variables, then in secrets.toml. It displays the exact keys tried.
  4. Note that config.toml was skipped! It could not contain any secrets.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.