Skip to main content

MotherDuck

๐Ÿงช MotherDuck is still invitation-only and is being intensively tested. Please see the limitations/problems at the end.

Install dlt with MotherDuckโ€‹

To install the dlt library with MotherDuck dependencies:

pip install dlt[motherduck]
tip

If you see a lot of retries in your logs with various timeouts, decrease the number of load workers to 3-5 depending on the quality of your internet connection. Add the following to your config.toml:

[load]
workers=3

or export the LOAD__WORKERS=3 env variable. See more in performance

Setup Guideโ€‹

1. Initialize a project with a pipeline that loads to MotherDuck by running

dlt init chess motherduck

2. Install the necessary dependencies for MotherDuck by running

pip install -r requirements.txt

This will install dlt with the motherduck extra which contains duckdb and pyarrow dependencies.

3. Add your MotherDuck token to .dlt/secrets.toml

[destination.motherduck.credentials]
database = "dlt_data_3"
password = "<your token here>"

Paste your service token into the password field. The database field is optional, but we recommend setting it. MotherDuck will create this database (in this case dlt_data_3) for you.

Alternatively, you can use the connection string syntax.

[destination]
motherduck.credentials="md:///dlt_data_3?token=<my service token>"

4. Run the pipeline

python3 chess_pipeline.py

Write dispositionโ€‹

All write dispositions are supported.

Data loadingโ€‹

By default, Parquet files and the COPY command are used to move files to the remote duckdb database. All write dispositions are supported.

The INSERT format is also supported and will execute large INSERT queries directly into the remote database. This method is significantly slower and may exceed the maximum query size, so it is not advised.

dbt supportโ€‹

This destination integrates with dbt via dbt-duckdb, which is a community-supported package. dbt version >= 1.5 is required (which is the current dlt default.)

Syncing of dlt stateโ€‹

This destination fully supports dlt state sync.

Automated testsโ€‹

Each destination must pass a few hundred automatic tests. MotherDuck is passing these tests (except for the transactions, of course). However, we have encountered issues with ATTACH timeouts when connecting, which makes running such a number of tests unstable. Tests on CI are disabled.

Troubleshooting / limitationsโ€‹

I see a lot of errors in the log like DEADLINE_EXCEEDED or Connection timed outโ€‹

MotherDuck is very sensitive to the quality of the internet connection and the number of workers used to load data. Decrease the number of workers and ensure your internet connection is stable. We have not found any way to increase these timeouts yet.

MotherDuck does not support transactions.โ€‹

Do not use begin, commit, and rollback on dlt sql_client or on the duckdb dbapi connection. It has no effect on DML statements (they are autocommit). It confuses the query engine for DDL (tables not found, etc.). If your connection is of poor quality and you get a timeout when executing a DML query, it may happen that your transaction got executed.

I see some exception with home_dir missing when opening md: connection.โ€‹

Some internal component (HTTPS) requires the HOME env variable to be present. Export such a variable to the command line. Here is what we do in our tests:

os.environ["HOME"] = "/tmp"

before opening the connection.

I see some watchdog timeouts.โ€‹

We also see them.

'ATTACH_DATABASE': keepalive watchdog timeout

Our observation is that if you write a lot of data into the database, then close the connection and then open it again to write, there's a chance of such a timeout. A possible WAL file is being written to the remote duckdb database.

Invalid Input Error: Initialization function "motherduck_init" from fileโ€‹

Use duckdb 0.8.1 or above.

Additional Setup guidesโ€‹

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.