Skip to main content
Version: 0.3.25

MotherDuck

๐Ÿงช MotherDuck is still invitation only and intensively tested. Please see the limitations / problems at the end.

tip

Decrease the number of load workers to 3-5 depending on the quality of your internet connection if you see a lot of retries in your logs with various timeout, add the following to your config.toml:

[load]
workers=3

or export LOAD__WORKERS=3 env variable. See more in performance

Setup Guideโ€‹

1. Initialize a project with a pipeline that loads to MotherDuck by running

dlt init chess motherduck

2. Install the necessary dependencies for MotherDuck by running

pip install -r requirements.txt

This will install dlt with motherduck extra which contains duckdb and pyarrow dependencies

3. Add your MotherDuck token to .dlt/secrets.toml

[destination.motherduck.credentials]
database = "dlt_data_3"
password = "<your token here>"

Paste your service token into password. The database field is optional but we recommend to set it. MotherDuck will create this database (in this case dlt_data_3) for you.

Alternatively you can use the connection string syntax

[destination]
motherduck.credentials="md:///dlt_data_3?token=<my service token>"

3. Run the pipeline

python3 chess_pipeline.py

Write dispositionโ€‹

All write dispositions are supported

Data loadingโ€‹

By default parquet files and COPY command is used to move files to remote duckdb database. All write dispositions are supported.

INSERT format is also supported and will execute a large INSERT queries directly into the remote database. This is way slower and may exceed maximum query size - so not advised.

dbt supportโ€‹

This destination integrates with dbt via dbt-duckdb which is a community supported package. dbt version >= 1.5 is required (which is current dlt default.)

Syncing of dlt stateโ€‹

This destination fully supports dlt state sync

Automated testsโ€‹

Each destination must pass few hundred automatic tests. MotherDuck is passing those tests (except the transactions OFC). However we encountered issues with ATTACH timeouts when connecting which makes running such number of tests unstable. Tests on CI are disabled.

Troubleshooting / limitationsโ€‹

I see a lot of errors in the log like DEADLINE_EXCEEDED or Connection timed outโ€‹

Motherduck is very sensitive to quality of the internet connection and number of workers used to load data. Decrease the number of workers and make sure your internet connection really works. We could not find any way to increase those timeouts yet.

MotherDuck does not support transactions.โ€‹

Do not use begin, commit and rollback on dlt sql_client or on duckdb dbapi connection. It has no effect for DML statements (they are autocommit). It is confusing the query engine for DDL (tables not found etc.). If your connection if of poor quality and you get a time out when executing DML query it may happen that your transaction got executed,

I see some exception with home_dir missing when opening md: connection.โ€‹

Some internal component (HTTPS) requires HOME env variable to be present. Export such variable to the command line. Here is what we do in our tests:

os.environ["HOME"] = "/tmp"

before opening connection

I see some watchdog timeouts.โ€‹

We also see them.

'ATTACH_DATABASE': keepalive watchdog timeout

My observation is that if you write a lot of data into the database then close the connection and then open it again to write, there's a chance of such timeout. Possible WAL file is being written to the remote duckdb database.

Invalid Input Error: Initialization function "motherduck_init" from fileโ€‹

Use duckdb 0.8.1

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub โ€“ it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.