SQLAlchemy destination
The SQLAlchemy destination allows you to use any database that has an SQLAlchemy dialect implemented as a destination.
Currently, MySQL and SQLite are considered to have full support and are tested as part of the dlt
CI suite. Other dialects are not tested but should generally work.
Install dlt with SQLAlchemyโ
Install dlt with the sqlalchemy
extra dependency:
pip install "dlt[sqlalchemy]"
Note that database drivers are not included and need to be installed separately for the database you plan on using. For example, for MySQL:
pip install mysqlclient
Refer to the SQLAlchemy documentation on dialects for information about client libraries required for supported databases.
Create a pipelineโ
1. Initialize a project with a pipeline that loads to MS SQL by running:
dlt init chess sqlalchemy
2. Install the necessary dependencies for SQLAlchemy by running:
pip install -r requirements.txt
or run:
pip install "dlt[sqlalchemy]"
3. Install your database client library.
E.g., for MySQL:
pip install mysqlclient
4. Enter your credentials into .dlt/secrets.toml
.
For example, replace with your database connection info:
[destination.sqlalchemy.credentials]
database = "dlt_data"
username = "loader"
password = "<password>"
host = "localhost"
port = 3306
driver_name = "mysql"
Alternatively, a valid SQLAlchemy database URL can be used, either in secrets.toml
or as an environment variable.
E.g.
[destination.sqlalchemy]
credentials = "mysql://loader:<password>@localhost:3306/dlt_data"
or
export DESTINATION__SQLALCHEMY__CREDENTIALS="mysql://loader:<password>@localhost:3306/dlt_data"
An SQLAlchemy Engine
can also be passed directly by creating an instance of the destination:
import sqlalchemy as sa
import dlt
engine = sa.create_engine('sqlite:///chess_data.db')
pipeline = dlt.pipeline(
pipeline_name='chess',
destination=dlt.destinations.sqlalchemy(engine),
dataset_name='main'
)
Notes on SQLiteโ
Dataset filesโ
When using an SQLite database file, each dataset is stored in a separate file since SQLite does not support multiple schemas in a single database file.
Under the hood, this uses ATTACH DATABASE
.
The file is stored in the same directory as the main database file (provided by your database URL).
E.g., if your SQLite URL is sqlite:////home/me/data/chess_data.db
and your dataset_name
is games
, the data
is stored in /home/me/data/chess_data__games.db
Note: If the dataset name is main
, no additional file is created as this is the default SQLite database.
In-memory databasesโ
In-memory databases require a persistent connection as the database is destroyed when the connection is closed.
Normally, connections are opened and closed for each load job and in other stages during the pipeline run.
To ensure the database persists throughout the pipeline run, you need to pass in an SQLAlchemy Engine
object instead of credentials.
This engine is not disposed of automatically by dlt
. Example:
import dlt
import sqlalchemy as sa
# Create the SQLite engine
engine = sa.create_engine('sqlite:///:memory:')
# Configure the destination instance and create pipeline
pipeline = dlt.pipeline('my_pipeline', destination=dlt.destinations.sqlalchemy(engine), dataset_name='main')
# Run the pipeline with some data
pipeline.run([1,2,3], table_name='my_table')
# The engine is still open and you can query the database
with engine.connect() as conn:
result = conn.execute(sa.text('SELECT * FROM my_table'))
print(result.fetchall())
Write dispositionsโ
The following write dispositions are supported:
append
replace
withtruncate-and-insert
andinsert-from-staging
replace strategies.staging-optimized
falls back toinsert-from-staging
.merge
withdelete-insert
andscd2
merge strategies.
Data loadingโ
Data is loaded in a dialect-agnostic manner with an insert
statement generated by SQLAlchemy's core API.
Rows are inserted in batches as long as the underlying database driver supports it. By default, the batch size is 10,000 rows.
Syncing of dlt
stateโ
This destination fully supports dlt state sync.
Data typesโ
All dlt
data types are supported, but how they are stored in the database depends on the SQLAlchemy dialect.
For example, SQLite does not have DATETIME
or TIMESTAMP
types, so timestamp
columns are stored as TEXT
in ISO 8601 format.
Supported file formatsโ
- typed-jsonl is used by default. JSON-encoded data with typing information included.
- Parquet is supported.
Supported column hintsโ
unique
hints are translated toUNIQUE
constraints via SQLAlchemy (granted the database supports it).