dltHub
Blog /

3.7x Faster EL Pipelines: Arrow + ADBC vs. SQLAlchemy

  • Aman Gupta,
    Data Engineer

Intro

Moving 5 million rows of partially nested JSON with SQLAlchemy takes long enough to grab a coffee because row-by-row serialization is slow. So what happens if we use Arrow end-to-end?

TL;DR
The Test:
SQLAlchemy: row-based (no ADBC) vs. Arrow + ADBC (columnar ingestion)
The Result: 3.7x speedup for EL pipelines using the Arrow + ADBC path.

5M Rows from DuckDB to MySQL < 100 Seconds

In our benchmarks, we reduced the 344-second baseline to 92 seconds by switching to the Arrow + ADBC path. Users have documented SQLAlchemy's performance bottlenecks for years. For instance, in (GitHub #9436), the architectural wall is encountered when writing to SQL Server due to driver and protocol overhead. Similarly, this same architectural wall means, switching from Pandas to Polars doesn't speed up read and write operations (r/dataengineering).

To skip these bottlenecks, we further improved dlt’s Arrow path and added ADBC support for MySQL, SQL Server, and SQLite in our December release.

The comparison: row vs. columnar

To see the difference, we tested two paths:

  • SQLAlchemy: Standard row-based (no ADBC).
  • Arrow + ADBC: Direct columnar ingestion.
PathTotal TimeNormalizeLoadSpeedup
SQLAlchemy344s167s132s1x
Arrow+ADBC92s0s89s3.7x
Hardware specs: MBP 14” (M4 Pro · 14c CPU · 16GB RAM)

Our experiment - the benchmark

We ran a side-by-side test moving 5M rows with a mix of flat and nested JSON data.

  • Source: DuckDB
  • Destination: MySQL
  • Matchup: SQLAlchemy (row-by-row) vs. Arrow + ADBC (columnar)

The traditional path: SQLAlchemy

Here’s the overhead. The moment we call to_pylist(), Arrow turns into millions of Python dicts, which causes the row by row serialization cost.

import dlt
import duckdb

@dlt.resource(name="large_table")
def row_by_row_source():
   with duckdb.connect("source_data_5m.duckdb", read_only=True) as conn:
       query_result = conn.execute("SELECT * FROM large_table")
       # Data is inflated into Python objects for the row-by-row path
       yield from (batch.to_pylist() for batch in query_result.fetch_record_batch(50_000))

pipeline = dlt.pipeline(
   pipeline_name="columnar_shift_benchmark_mysql",
   destination="mysql",
   dataset_name="benchmark_db"
)

# We enabled 8 parallel workers to give the row-based path every advantage.
pipeline.run(row_by_row_source(), write_disposition="replace")
Results SQLAlchemy (average over 5 runs)

StageTime
Extract45.05s
Normalize166.75s
Load132.39s
Total344.21s (~5.67 min)

The fast lane: Arrow + ADBC

By streaming Arrow batches directly, we eliminate Python object handling costs. We stream Arrow batches “as-is”, preserving the columnar format and letting ADBC bulk load Parquet directly into the destination.

import dlt
import duckdb

@dlt.resource(name="large_table")
def fast_lane_source():
    with duckdb.connect("source_data_5m.duckdb", read_only=True) as conn:
        result = conn.execute("SELECT * FROM large_table")
        yield from result.fetch_record_batch(50_000)

pipeline = dlt.pipeline(
    pipeline_name="benchmark_arrow_mysql",
    destination="mysql",
    dataset_name="benchmark_db_arrow",
)

pipeline.run(
    fast_lane_source(),
    loader_file_format="parquet",
    write_disposition="replace",
)
Results Arrow+ADBC (avg. 5 runs)
StageTime
Extract2.91s
Normalize0.01s
Load89.32s
Total92.36s (~1.54 min)

What the runs tell us

Extraction: 45s -> ~ 3s

By keeping the data in Arrow buffers, we bypass Python object instantiation and serialization overhead. We’ve shifted the bottleneck away from the CPU, allowing the data to flow at the native speed of the source engine.

Normalization: 167s -> ~10ms

Normalization is almost instantaneous because it is now a metadata-only operation. dlt aligns the schema using Arrow metadata instead of iterating through objects to infer types.

Load: 132s -> 89s

We replace row-by-row inserts with ADBC bulk loading.. This moves memory buffers directly into the database engine, bypassing the overhead of traditional communication protocols.

Net result: ~344s down to ~92s, which is a 3.7x speedup end-to-end.

The big win

The real win is architectural simplicity. By the path from source to destination, we’ve reduced the moving parts. This means fewer moving parts, a reduced failure surface, and simplified state management. By doing away with the overhead of transforming columnar data into dict-shaped structures, you not only improve performance but also save on compute.

Why Arrow

Arrow allows us to move structured data without the memory bloat and CPU tax of serialization.

By using its in-memory columnar format, we stop wasting cycles de-serializing, re-typing and re-serializing millions of discrete objects. The bottleneck shifts from the CPU to the underlying transport. By switching to arrow we are removing the overhead of row-based structures, that means lower compute costs and higher throughput.

Why dlt

With dlt, Arrow stops being a special case and becomes the default path. You get the performance of a columnar architecture within the full toolkit for building loading pipelines. It’s a cleaner design with fewer moving parts, and simplified pipeline maintenance.

Try it yourself

Arrow pipelines aren't experimental; they are dlt’s fast lane. Try it yourself easily, by setting loader_file_format="parquet" on supported destinations.

Resources: