Skip to main content
Version: 1.15.0 (latest)

Source configuration

dlt+

This page is for dlt+, which requires a license. Join our early access program for a trial license.

The dlt.yml file enables a fully declarative setup of your data source and its parameters. It supports built-in sources such as REST APIs, SQL databases, and cloud storage, as well as any custom source you define.

Credential placeholders for the defined sources are automatically generated in .dlt/secrets.toml. Alternatively, configuration may also be provided directly within dlt.yml.

REST API

The built-in rest_api-type enables configuration of REST-based integrations. Multiple endpoints can be defined under a single source.

sources:
pokemon_api:
type: rest_api
client:
base_url: https://pokeapi.co/api/v2/
paginator: auto
resource_defaults:
primary_key: name
resources:
- pokemon
- berry
-
name: encounter_conditions
endpoint:
path: encounter-conditions
params:
offset:
type: incremental
cursor_path: name
write_disposition: append
  • type: rest_api: Specifies the use of the built-in REST API source.
  • client.base_url: Sets the root URL for all API requests.
  • paginator: auto: Enables automatic detection and handling of pagination.
  • resource_defaults: Contains the default values to configure the dlt resources. This configuration is applied to all resources unless overridden by the resource-specific configuration.
  • Each item in resourcesdefines an endpoint to extract. Simple entries like pokemon and berry will fetch from /pokemon and /berry, respectively.
  • The encounter-conditios resource uses an advanced configuration:
    • path: Point to the /encounter-condition endpoint.
    • params.offset: Enables incremental loading using the name field as the cursor.
    • write_disposition: replace: Replaces the destination dataset with whatever the source produced on this run.

SQL database

For SQL-base extractions that require no table-specific parameter configuration, it's possible to initialize type: sql_database and declare multiple tables at once.

General SQL database source

sources:
sql_source:
type: sql_database
table_names:
- family
- clan
incremental:
cursor_path: updated
initial_value: 2023-01-12T11:21:28Z

This defines a connection to a SQL database with incremental loading applied across multiple tables.

  • type: sql_database: Specifies the SQL database connector.

  • table_names: List of tables to extract.

  • incremental: Global configuration for incremental extraction.

Table-specific configuration

For table specific configurationssettings such as different primary_keys, individual tables can be defined as standalone sources using the sql_table type.

sources: 
sql_family:
type: sql_database.sql_table
table: family
incremental:
cursor_path: updated
initial_value: 2023-01-12T11:21:28Z
primary_key: rfam_id
  • type: sql_table: Indicates a single-table extraction.

  • table: Name of the table to extract.

  • incremental: Enables incremental loading for the table.

  • primary_key: Specifies the table's unique identifier for deduplication and merges.

Filesystem

Filesystem sources can be set via the readers type and the filesystem specific resources can be called via the CLI run pipline command.

sources: 
file_source:
type: filesystem.readers
bucket_url: file://Users/admin/Documents/csv_files
file_glob: '*.csv'

dlt pipeline file_pipeline run --resources read_csv

tip

Source type is used to refer to the location in Python code where the @dlt.source decorated function is present. You can always use a full path to a function name in a Python module, but we also support shorthand and relative notations. For example:

  • rest_api will be expanded to dlt.sources.rest_api.rest_api where dlt.sources.rest_api is a Python module in OSS dlt and rest_api is a name of a function in that module.
  • github.source will be expanded to sources.github.sources in the current project.
  • filesystem.readers will be expanded to dlt.sources.filesystem.readers

If the type cannot be resolved, dlt+ will provide you with a detailed list of all candidate types that were looked up so you can make required corrections.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.