Skip to main content
Version: devel

SQLAlchemy destination

The SQLAlchemy destination allows you to use any database that has an SQLAlchemy dialect implemented as a destination.

Currently, MySQL and SQLite are considered to have full support and are tested as part of the dlt CI suite. Other dialects are not tested but should generally work.

Install dlt with SQLAlchemyโ€‹

Install dlt with the sqlalchemy extra dependency:

pip install "dlt[sqlalchemy]"

Note that database drivers are not included and need to be installed separately for the database you plan on using. For example, for MySQL:

pip install mysqlclient

Refer to the SQLAlchemy documentation on dialects for information about client libraries required for supported databases.

Create a pipelineโ€‹

1. Initialize a project with a pipeline that loads to MS SQL by running:

dlt init chess sqlalchemy

2. Install the necessary dependencies for SQLAlchemy by running:

pip install -r requirements.txt

or run:

pip install "dlt[sqlalchemy]"

3. Install your database client library.

E.g., for MySQL:

pip install mysqlclient

4. Enter your credentials into .dlt/secrets.toml.

For example, replace with your database connection info:

[destination.sqlalchemy.credentials]
database = "dlt_data"
username = "loader"
password = "<password>"
host = "localhost"
port = 3306
driver_name = "mysql"

Alternatively, a valid SQLAlchemy database URL can be used, either in secrets.toml or as an environment variable. E.g.

[destination.sqlalchemy]
credentials = "mysql://loader:<password>@localhost:3306/dlt_data"

or

export DESTINATION__SQLALCHEMY__CREDENTIALS="mysql://loader:<password>@localhost:3306/dlt_data"

An SQLAlchemy Engine can also be passed directly by creating an instance of the destination:

import sqlalchemy as sa
import dlt

engine = sa.create_engine('sqlite:///chess_data.db')

pipeline = dlt.pipeline(
pipeline_name='chess',
destination=dlt.destinations.sqlalchemy(engine),
dataset_name='main'
)

Notes on SQLiteโ€‹

Dataset filesโ€‹

When using an SQLite database file, each dataset is stored in a separate file since SQLite does not support multiple schemas in a single database file. Under the hood, this uses ATTACH DATABASE.

The file is stored in the same directory as the main database file (provided by your database URL).

E.g., if your SQLite URL is sqlite:////home/me/data/chess_data.db and your dataset_name is games, the data is stored in /home/me/data/chess_data__games.db

Note: If the dataset name is main, no additional file is created as this is the default SQLite database.

In-memory databasesโ€‹

In-memory databases require a persistent connection as the database is destroyed when the connection is closed. Normally, connections are opened and closed for each load job and in other stages during the pipeline run. To ensure the database persists throughout the pipeline run, you need to pass in an SQLAlchemy Engine object instead of credentials. This engine is not disposed of automatically by dlt. Example:

import dlt
import sqlalchemy as sa

# Create the SQLite engine
engine = sa.create_engine('sqlite:///:memory:')

# Configure the destination instance and create pipeline
pipeline = dlt.pipeline('my_pipeline', destination=dlt.destinations.sqlalchemy(engine), dataset_name='main')

# Run the pipeline with some data
pipeline.run([1,2,3], table_name='my_table')

# The engine is still open and you can query the database
with engine.connect() as conn:
result = conn.execute(sa.text('SELECT * FROM my_table'))
print(result.fetchall())

Write dispositionsโ€‹

The following write dispositions are supported:

  • append
  • replace with truncate-and-insert and insert-from-staging replace strategies. staging-optimized falls back to insert-from-staging.
  • merge with delete-insert and scd2 merge strategies.

Data loadingโ€‹

Data is loaded in a dialect-agnostic manner with an insert statement generated by SQLAlchemy's core API. Rows are inserted in batches as long as the underlying database driver supports it. By default, the batch size is 10,000 rows.

Syncing of dlt stateโ€‹

This destination fully supports dlt state sync.

Data typesโ€‹

All dlt data types are supported, but how they are stored in the database depends on the SQLAlchemy dialect. For example, SQLite does not have DATETIME or TIMESTAMP types, so timestamp columns are stored as TEXT in ISO 8601 format.

Supported file formatsโ€‹

  • typed-jsonl is used by default. JSON-encoded data with typing information included.
  • Parquet is supported.

Supported column hintsโ€‹

  • unique hints are translated to UNIQUE constraints via SQLAlchemy (granted the database supports it).

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.