Skip to main content

Postgres

Install dlt with PostgreSQLโ€‹

To install the dlt library with PostgreSQL dependencies, run:

pip install "dlt[postgres]"

Setup Guideโ€‹

1. Initialize a project with a pipeline that loads to Postgres by running:

dlt init chess postgres

2. Install the necessary dependencies for Postgres by running:

pip install -r requirements.txt

This will install dlt with the postgres extra, which contains the psycopg2 client.

3. After setting up a Postgres instance and psql / query editor, create a new database by running:

CREATE DATABASE dlt_data;

Add the dlt_data database to .dlt/secrets.toml.

4. Create a new user by running:

CREATE USER loader WITH PASSWORD '<password>';

Add the loader user and <password> password to .dlt/secrets.toml.

5. Give the loader user owner permissions by running:

ALTER DATABASE dlt_data OWNER TO loader;

You can set more restrictive permissions (e.g., give user access to a specific schema).

6. Enter your credentials into .dlt/secrets.toml. It should now look like this:

[destination.postgres.credentials]

database = "dlt_data"
username = "loader"
password = "<password>" # replace with your password
host = "localhost" # or the IP address location of your database
port = 5432
connect_timeout = 15

You can also pass a database connection string similar to the one used by the psycopg2 library or SQLAlchemy. The credentials above will look like this:

# keep it at the top of your toml file! before any section starts
destination.postgres.credentials="postgresql://loader:<password>@localhost/dlt_data?connect_timeout=15"

To pass credentials directly, use the explicit instance of the destination

pipeline = dlt.pipeline(
pipeline_name='chess',
destination=dlt.destinations.postgres("postgresql://loader:<password>@localhost/dlt_data"),
dataset_name='chess_data'
)

Write dispositionโ€‹

All write dispositions are supported.

If you set the replace strategy to staging-optimized, the destination tables will be dropped and replaced by the staging tables.

Data loadingโ€‹

dlt will load data using large INSERT VALUES statements by default. Loading is multithreaded (20 threads by default).

Fast loading with arrow tables and csvโ€‹

You can use arrow tables and csv to quickly load tabular data. Pick the csv loader file format like below

info = pipeline.run(arrow_table, loader_file_format="csv")

In the example above arrow_table will be converted to csv with pyarrow and then streamed into postgres with COPY command. This method skips the regular dlt normalizer used for Python objects and is several times faster.

Supported file formatsโ€‹

Supported column hintsโ€‹

postgres will create unique indexes for all columns with unique hints. This behavior may be disabled.

Additional destination optionsโ€‹

The Postgres destination creates UNIQUE indexes by default on columns with the unique hint (i.e., _dlt_id). To disable this behavior:

[destination.postgres]
create_indexes=false

dbt supportโ€‹

This destination integrates with dbt via dbt-postgres.

Syncing of dlt stateโ€‹

This destination fully supports dlt state sync.

Additional Setup guidesโ€‹

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.