Blog /March 16, 2023

Is DuckDB a database for ducks?

Matthaus Krzykowski,
Co-Founder & CEO

Using DuckDB, dlt, & GitHub to explore DuckDB

TL;DR: We created a Colab notebook for you to learn more about DuckDB (or any open source repository of interest) using DuckDB, dlt, and the GitHub API 🙂

So is DuckDB full of data about ducks?

Nope, you can put whatever data you want into DuckDB ✨

Many data analysts, data scientists, and developers prefer to work with data on their laptops. DuckDB allows them to start quickly and easily. When working only locally becomes infeasible, they can then turn this local “data pond” into a data lake, storing their data on object storage like Amazon S3, and continue to use DuckDB as a query engine on top of the files stored there.

If you want to better understand why folks are excited about DuckDB, check out this blog post.

Perhaps ducks use DuckDB?

Once again, the answer is also 'nein'. As far as we can tell, usually people use DuckDB 🦆

To determine this, we loaded emoji reaction data for DuckDB repo using data load tool (dlt) from the GitHub API to a DuckDB instance and explored who has been reacting to issues / PRs in the open source community. This is what we learned…

The three issues / PRs with the most reactions all-time are

The three issues / PRs with the most reactions in 2023 are

Some of the most engaged users (other than the folks who work at DuckDB Labs) include

@Tishj, @xhochy, and @handstuyennn, who received the most 👍 reactions
@lloydtabb, @cboettig, and @LindsayWray, who received the most ❤️ reactions
@dforsber, @ankoh, and @djouallah, who gave the most total reactions

All of these users seem to be people. Admittedly, we didn’t look at everyone though, so there could be ducks within the flock. You can check yourself by playing with the Colab notebook.

Maybe it’s called DuckDB because you can use it to create a "data pond" that can grow into a data lake + ducks like water?

Although this is a cool idea, it is still not the reason that it is called DuckDB 🌊

Using functionality offered by DuckDB to export the data loaded to it as Parquet files, you can create a small “data pond” on your local computer. To make it a data lake, you can then add these files to Google Cloud Storage, Amazon S3, etc. And if you want this data lake to always fill with the latest data from the GitHub API, you can deploy the dlt pipeline.

Check this out in the Colab notebook and let us know if you want some help setting this up.

Just tell me why it is called DuckDB!!!

Okay. It’s called DuckDB because ducks are amazing and @hannes once had a pet duck 🤣

Source: DuckDB: an Embeddable Analytical RDBMS

Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub here 🤜🤛

As DuckDB crosses 1M downloads / month, what do its users do?

Internal Dashboard for Google Analytics 4

Using DuckDB, dlt, & GitHub to explore DuckDBLink icon

So is DuckDB full of data about ducks?Link icon

Perhaps ducks use DuckDB?Link icon

Maybe it’s called DuckDB because you can use it to create a "data pond" that can grow into a data lake + ducks like water?Link icon

Just tell me why it is called DuckDB!!!Link icon

Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub here 🤜🤛Link icon

Using DuckDB, dlt, & GitHub to explore DuckDB

So is DuckDB full of data about ducks?

Perhaps ducks use DuckDB?

Maybe it’s called DuckDB because you can use it to create a "data pond" that can grow into a data lake + ducks like water?

Just tell me why it is called DuckDB!!!

Enjoy this blog post? Give data load tool (dlt) a ⭐ on GitHub here 🤜🤛