MotherDuck
Install dlt with MotherDuckโ
To install the dlt library with MotherDuck dependencies:
pip install "dlt[motherduck]"
If you see a lot of retries in your logs with various timeouts, decrease the number of load workers to 3-5 depending on the quality of your internet connection. Add the following to your config.toml
:
[load]
workers=3
or export the LOAD__WORKERS=3 env variable. See more in performance
Setup guideโ
1. Initialize a project with a pipeline that loads to MotherDuck by running
dlt init chess motherduck
2. Install the necessary dependencies for MotherDuck by running
pip install -r requirements.txt
This will install dlt with the motherduck extra which contains duckdb and pyarrow dependencies.
3. Add your MotherDuck token to .dlt/secrets.toml
[destination.motherduck.credentials]
database = "dlt_data_3"
password = "<your token here>"
Paste your service token into the password field. The database
field is optional, but we recommend setting it. MotherDuck will create this database (in this case dlt_data_3
) for you.
Alternatively, you can use the connection string syntax.
[destination]
motherduck.credentials="md:dlt_data_3?motherduck_token=<my service token>"
Motherduck now supports configurable access tokens. Please refer to the documentation
You can pass token in a native Motherduck environment variable:
export motherduck_token='<token>'
in that case you can skip password / motherduck_token secret.
database defaults to my_db
.
More in Motherduck documentation
4. Run the pipeline
python3 chess_pipeline.py
Motherduck connection identifierโ
We enable Motherduck to identify that the connection is created by dlt
. Motherduck will use this identifier to better understand the usage patterns
associated with dlt
integration. The connection identifier is dltHub_dlt/DLT_VERSION(OS_NAME)
.
Write dispositionโ
All write dispositions are supported.
Data loadingโ
By default, Parquet files and the COPY
command are used to move files to the remote duckdb database. All write dispositions are supported.
The INSERT format is also supported and will execute large INSERT queries directly into the remote database. This method is significantly slower and may exceed the maximum query size, so it is not advised.
dbt supportโ
This destination integrates with dbt via dbt-duckdb, which is a community-supported package. dbt
version >= 1.7 is required.
Multi-statement transaction supportโ
Motherduck supports multi-statement transactions. This change happened with duckdb 0.10.2
.
Syncing of dlt
stateโ
This destination fully supports dlt state sync.
Troubleshootingโ
My database is attached in read-only modeโ
i.e., Error: Invalid Input Error: Cannot execute statement of type "CREATE" on database "dlt_data" which is attached in read-only mode!
We encountered this problem for databases created with duckdb 0.9.x
and then migrated to 0.10.x
. After switching to 1.0.x
on Motherduck, all our databases had permission "read-only" visible in UI. We could not figure out how to change it, so we dropped and recreated our databases.
I see some exception with home_dir missing when opening md:
connection.โ
Some internal component (HTTPS) requires the HOME env variable to be present. Export such a variable to the command line. Here is what we do in our tests:
os.environ["HOME"] = "/tmp"
before opening the connection.
Additional Setup guidesโ
- Load data from Harvest to MotherDuck in python with dlt
- Load data from Sentry to MotherDuck in python with dlt
- Load data from Coinbase to MotherDuck in python with dlt
- Load data from Mux to MotherDuck in python with dlt
- Load data from GitHub to MotherDuck in python with dlt
- Load data from Pinterest to MotherDuck in python with dlt
- Load data from MongoDB to MotherDuck in python with dlt
- Load data from Spotify to MotherDuck in python with dlt
- Load data from IBM Db2 to MotherDuck in python with dlt
- Load data from Jira to MotherDuck in python with dlt