Blog /April 23, 2024

Replacing SaaS ETL with Python dlt: A painless experience for Yummy.eu

Adrian Brudaru,
Co-Founder & CDO

About Yummy.eu:
Yummy is a Lean-ops meal-kit company that streamlines the entire food preparation process for customers in emerging markets by providing personalized recipes, nutritional guidance, and even shopping services. Their innovative approach ensures a hassle-free, nutritionally optimized meal experience, making daily cooking convenient and enjoyable.
Yummy is a food box business. At the intersection of gastronomy and logistics, this market is very competitive. To make it in this market, Yummy needs to be fast and informed in its operations.

Pipelines are not yet a commodity.

At Yummy, efficiency and timeliness are paramount. Initially, Martin, Yummy’s CTO, chose to purchase data pipelining tools for their operational and analytical needs, aiming to maximize time efficiency. However, the real-world performance of these purchased solutions did not meet expectations, which led to a reassessment of their approach.

What’s important: Velocity, Reliability, Speed, time. Money is secondary.

Martin was initially satisfied with the ease of setup provided by the SaaS services.

The tipping point came when an update to Yummy’s database introduced a new log table, leading to unexpectedly high fees due to the vendor’s default settings that automatically replicated new tables fully on every refresh. This situation highlighted the need for greater control over data management processes and prompted a shift towards more transparent and cost-effective solutions.

💡 Proactive management of data pipeline settings is essential. Automatic replication of new tables, while common, often leads to increased costs without adding value, especially if those tables are not immediately needed. Understanding and adjusting these settings can lead to significant cost savings and more efficient data use.

Motivated to find a solution that balanced cost with performance, Martin explored using dlt, a tool known for its simplicity in building data pipelines. By combining dlt with asynchronous operations and using Modal for managed execution, the improvements were substantial:

Data processing speed increased tenfold.
Cost reduced by 182 times compared to the traditional SaaS tool.
The new system supports extracting data once and writing to multiple destinations without additional costs.

For a peek into how Martin implemented this solution, please see Martin's async Postgres source on GitHub.

Taking back control with open source has never been easier

Taking control of your data stack is more accessible than ever with the broad array of open-source tools available. SQL copy pipelines, often seen as a basic utility in data management, do not generally differ significantly between platforms. They perform similar transformations and schema management, making them a commodity available at minimal cost.

SQL to SQL copy pipelines are widespread, yet many service providers charge exorbitant fees for these simple tasks. In contrast, these pipelines can often be set up and run at a fraction of the cost—sometimes just the price of a few coffees.

At dltHub, we advocate for leveraging straightforward, freely available resources to regain control over your data processes and budget effectively.

Setting up a SQL pipeline can take just a few minutes with the right tools. Explore these resources to enhance your data operations:

30+ SQL database sources
Martin’s async PostgreSQL source
Arrow + connectorx for up to 30x faster data transfers

Need help optimising your pipelines?

Reach out to our solutions engineers to get your team trained on dlt best practices and its ecosystem, completely customized to your data stack.

Simplifying SDMX Data Integration with Python

On Orchestrators: You Are All Right, But You Are All Wrong Too

Pipelines are not yet a commodity.Link icon

What’s important: Velocity, Reliability, Speed, time. Money is secondary.Link icon

10x faster, 182x cheaper with dlt + async + modalLink icon

Taking back control with open source has never been easierLink icon

Need help optimising your pipelines?Link icon

Pipelines are not yet a commodity.

What’s important: Velocity, Reliability, Speed, time. Money is secondary.

10x faster, 182x cheaper with dlt + async + modal

Taking back control with open source has never been easier

Need help optimising your pipelines?