Skip to main content

Glossary

Source

Location that holds data with certain structure. Organized into one or more resources.

  • If endpoints in an API are the resources, then the API is the source.
  • If tabs in a spreadsheet are the resources, then the source is the spreadsheet.
  • If tables in a database are the resources, then the source is the database.

Within this documentation, source refers also to the software component (i.e. Python function) that extracts data from the source location using one or more resource components.

Resource

A logical grouping of data within a data source, typically holding data of similar structure and origin.

  • If the source is an API, then a resource is an endpoint in that API.
  • If the source is a spreadsheet, then a resource is a tab in that spreadsheet.
  • If the source is a database, then a resource is a table in that database.

Within this documentation, resource refers also to the software component (i.e. Python function) that extracts the data from source location.

Destination

The data store where data from the source is loaded (e.g. Google BigQuery).

Pipeline

Moves the data from the source to the destination, according to instructions provided in the schema (i.e. extracting, normalizing, and loading the data).

Verified Source

A Python module distributed with dlt init that allows creating pipelines that extract data from a particular Source. Such module is intended to be published in order for others to use it to build pipelines.

A source must be published to become "verified": which means that it has tests, test data, demonstration scripts, documentation and the dataset produces was reviewed by a data engineer.

Schema

Describes the structure of normalized data (e.g. unpacked tables, column types, etc.) and provides instructions on how the data should be processed and loaded (i.e. it tells dlt about the content of the data and how to load it into the destination).

Config

A set of values that are passed to the pipeline at run time (e.g. to change its behavior locally vs. in production).

Credentials

A subset of configuration whose elements are kept secret and never shared in plain text.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.