Remerge's journey from manual processes to dlt pipelines

Highlights

Significantly reduced complexity
Replaced cumbersome manual data processes with streamlined dlt pipelines, while achieving a 10x reduction in code complexity.
Centralised data access
Successfully centralised previously scattered company data from sources like Postgres, JIRA, and Druid into BigQuery, enabling easier reporting and analysis.
Improved operational efficiency
Empowered teams to easily generate necessary reports directly from BigQuery, eliminating reliance on manual API calls, spreadsheet exports, and data manipulation.

Data stack

Data sources: Postgres, Jira, Druid, and various third-party APIs
Destinations: BigQuery, Postgres, cost reporting and monitoring tools
Orchestration: Prefect
Transformation: dbt

Challenge: Overcoming manual data processes and siloed information

Remerge, an ad-tech company in Berlin, faced challenges with its internal data management, stemming from data silos that prevented a single source of truth. Aiming to overcome this, the company sought to establish this unified view in BigQuery to unlock better reporting, operational insights, and future machine learning and data training.

Various teams relied on disparate systems, including production databases (Postgres), project management tools (JIRA), and analytical platforms (Druid). Data was scattered across different APIs and systems, often managed by different teams.

Accessing and combining data for reporting and operational insights was a chore.

Without a standardized tool, our teams resorted to downloading data into spreadsheets and attempting manual consolidation from CSV files with macros. We recognized the need to centralize this process and provide a reliable single source of truth.

- Eugene Bikkinin, Senior Data Platform Engineer, Remerge

Initial prototyping to build data pipelines was often inefficient or overly complex. One critical process built on Go relied on a custom-built service, was complex, difficult to modify, and unreliable, suffering from data duplication and stability issues despite months of development effort.

In recognizing the problems involved in the efforts to centralize data assets into BigQuery, the team explored both UI-based data integration tools and code-based solutions. They found UI tools less aligned with their team's strengths and goals.

While evaluating tools, we found UI-based systems often felt like 'black boxes'. We preferred dlt's transparent, code-based approach which aligned better with our team's Python skills and desire for control.

- Nas Denkov, Senior Software Engineer, Remerge

Solution: Embracing dlt for agile and transparent data pipelines

Remerge discovered dlt, an open-source Python library, which immediately appealed due to its alignment with their existing technical skills and its transparent nature.

A successful initial experiment quickly demonstrated the library's potential. Porting the problematic 2000-line Go project to dlt resulted in a dramatic improvement:

The "eureka" moment came when we ported that complex 2000-line Go project to just 200 lines of Python using dlt. Achieving a 10x code reduction for a pipeline that has run reliably ever since proved the value immediately.

- Nas Denkov, Senior Software Engineer, Remerge

This initial success solidified the decision to fully adopt dlt. The team gradually added additional dlt pipelines for their key data sources, including JIRA, Druid and more. They also used dlt’s reverse ETL capabilities to export their dbt data marts to Postgres.

Remerge leveraged several key dlt features:

Flexibility and customisation: The ability to create custom sources easily, such as integrating a Druid instance using the pydruid library was a highly valued feature.
Destination integration: dlt's seamless integration with BigQuery simplified the loading process into their chosen data warehouse.
Incremental loading and merge write disposition: Allowing incremental loading and the merge write disposition were crucial, with dlt automatically handling cursor tracking and complex merge statements. This reduced complexity.

Results: enhanced efficiency and data-driven insights

Implementing dlt yielded significant positive outcomes, directly addressing the initial challenges while building a foundation for future growth.

Reduced development costs: Faster, simpler development significantly reduced the engineering costs and effort associated with data integration.
Improved operational efficiency: The most significant impact was felt by the end-users of the data. Citing a “night and day difference,” teams at Remerge no longer need manual data work, instead running queries in BigQuery to help focus work on analysis rather than data wrangling.
Increased data pipeline velocity: Development speed improved dramatically compared to previous custom builds. With resources freed up, the team now anticipates adding one to two new pipelines per month, including gathering technical or process metrics (e.g from GitHub Actions) for enhanced monitoring.

Previously, building a single data pipeline was a major undertaking, often taking several weeks or months. With dlt's efficiency, we deployed five pipelines in our first three months. Development is significantly faster and cheaper, allowing us to increase our output.

- Eugene Bikkinin, Senior Data Platform Engineer, Remerge

Future: More savings, deeper insights

Thanks to dlt, Remerge now has a clear, adaptable data system. Internal teams get reliable information much faster, which helps improve how they operate now and prepares them for what's next.

Looking ahead, Remerge has a number of plans to keep expanding their use of dlt, including:

Adding ElasticSearch as a source, so as to export query results into BigQuery. This allows the team to analyse data from their search infrastructure alongside other data.
Additional reverse ETL pipelines to feed data from BigQuery back into their real-time systems to make real-time decisions.
Remerge is also considering implementing more third-party API integrations with partners, allowing internal data to be more easily shared, such as to finance and transactional systems.

The real power of dlt for us comes from its core features like automatic incremental loading and merge capabilities. By handling these complex, error-prone tasks reliably, dlt significantly simplified development and reduced our engineering burden, letting us scale our pipelines effectively.

- Nas Denkov, Senior Software Engineer, Remerge

About the customer

Remerge

Remerge is an independent, top-tier mobile DSP that helps the world’s largest apps drive revenue and growth through programmatic advertising. Remerge specializes in app-retargeting and has expanded its platform to offer privacy-centric user acquisition. Complemented by a fully managed service, Remerge is the trusted partner for leading apps across all major verticals including gaming, on-demand delivery, e-commerce, and finance.

Highlights

Significantly reduced complexity
Replaced cumbersome manual data processes with streamlined dlt pipelines, while achieving a 10x reduction in code complexity.
Centralised data access
Successfully centralised previously scattered company data from sources like Postgres, JIRA, and Druid into BigQuery, enabling easier reporting and analysis.
Improved operational efficiency
Empowered teams to easily generate necessary reports directly from BigQuery, eliminating reliance on manual API calls, spreadsheet exports, and data manipulation.

Data stack

Data sources: Postgres, Jira, Druid, and various third-party APIs
Destinations: BigQuery, Postgres, cost reporting and monitoring tools
Orchestration: Prefect
Transformation: dbt

Challenge: Overcoming manual data processes and siloed information

Accessing and combining data for reporting and operational insights was a chore.

Without a standardized tool, our teams resorted to downloading data into spreadsheets and attempting manual consolidation from CSV files with macros. We recognized the need to centralize this process and provide a reliable single source of truth.

- Eugene Bikkinin, Senior Data Platform Engineer, Remerge

While evaluating tools, we found UI-based systems often felt like 'black boxes'. We preferred dlt's transparent, code-based approach which aligned better with our team's Python skills and desire for control.

- Nas Denkov, Senior Software Engineer, Remerge

Solution: Embracing dlt for agile and transparent data pipelines

Remerge discovered dlt, an open-source Python library, which immediately appealed due to its alignment with their existing technical skills and its transparent nature.

A successful initial experiment quickly demonstrated the library's potential. Porting the problematic 2000-line Go project to dlt resulted in a dramatic improvement:

The "eureka" moment came when we ported that complex 2000-line Go project to just 200 lines of Python using dlt. Achieving a 10x code reduction for a pipeline that has run reliably ever since proved the value immediately.

- Nas Denkov, Senior Software Engineer, Remerge

Remerge leveraged several key dlt features:

Flexibility and customisation: The ability to create custom sources easily, such as integrating a Druid instance using the pydruid library was a highly valued feature.
Destination integration: dlt's seamless integration with BigQuery simplified the loading process into their chosen data warehouse.
Incremental loading and merge write disposition: Allowing incremental loading and the merge write disposition were crucial, with dlt automatically handling cursor tracking and complex merge statements. This reduced complexity.

Results: enhanced efficiency and data-driven insights

Implementing dlt yielded significant positive outcomes, directly addressing the initial challenges while building a foundation for future growth.

Reduced development costs: Faster, simpler development significantly reduced the engineering costs and effort associated with data integration.
Improved operational efficiency: The most significant impact was felt by the end-users of the data. Citing a “night and day difference,” teams at Remerge no longer need manual data work, instead running queries in BigQuery to help focus work on analysis rather than data wrangling.
Increased data pipeline velocity: Development speed improved dramatically compared to previous custom builds. With resources freed up, the team now anticipates adding one to two new pipelines per month, including gathering technical or process metrics (e.g from GitHub Actions) for enhanced monitoring.

Previously, building a single data pipeline was a major undertaking, often taking several weeks or months. With dlt's efficiency, we deployed five pipelines in our first three months. Development is significantly faster and cheaper, allowing us to increase our output.

- Eugene Bikkinin, Senior Data Platform Engineer, Remerge

Future: More savings, deeper insights

Thanks to dlt, Remerge now has a clear, adaptable data system. Internal teams get reliable information much faster, which helps improve how they operate now and prepares them for what's next.

Looking ahead, Remerge has a number of plans to keep expanding their use of dlt, including:

Adding ElasticSearch as a source, so as to export query results into BigQuery. This allows the team to analyse data from their search infrastructure alongside other data.
Additional reverse ETL pipelines to feed data from BigQuery back into their real-time systems to make real-time decisions.
Remerge is also considering implementing more third-party API integrations with partners, allowing internal data to be more easily shared, such as to finance and transactional systems.

The real power of dlt for us comes from its core features like automatic incremental loading and merge capabilities. By handling these complex, error-prone tasks reliably, dlt significantly simplified development and reduced our engineering burden, letting us scale our pipelines effectively.

- Nas Denkov, Senior Software Engineer, Remerge

About the customer

Remerge's journey from manual processes to streamlined pipelines

Highlights

Data stack

Challenge: Overcoming manual data processes and siloed information

Solution: Embracing dlt for agile and transparent data pipelines

Results: enhanced efficiency and data-driven insights

Future: More savings, deeper insights

Remerge

Remerge's journey from manual processes to streamlined pipelines

Highlights

Data stack

Challenge: Overcoming manual data processes and siloed information

Solution: Embracing dlt for agile and transparent data pipelines

Results: enhanced efficiency and data-driven insights

Future: More savings, deeper insights

Remerge

HighlightsLink icon

Data stackLink icon

Challenge: Overcoming manual data processes and siloed informationLink icon

Solution: Embracing dlt for agile and transparent data pipelinesLink icon

Results: enhanced efficiency and data-driven insightsLink icon

Future: More savings, deeper insightsLink icon

HighlightsLink icon

Data stackLink icon

Challenge: Overcoming manual data processes and siloed informationLink icon

Solution: Embracing dlt for agile and transparent data pipelinesLink icon

Results: enhanced efficiency and data-driven insightsLink icon

Future: More savings, deeper insightsLink icon

Highlights

Data stack

Challenge: Overcoming manual data processes and siloed information

Solution: Embracing dlt for agile and transparent data pipelines

Results: enhanced efficiency and data-driven insights

Future: More savings, deeper insights

Highlights

Data stack

Challenge: Overcoming manual data processes and siloed information

Solution: Embracing dlt for agile and transparent data pipelines

Results: enhanced efficiency and data-driven insights

Future: More savings, deeper insights