3.7x Faster EL Pipelines: Arrow + ADBC vs. SQLAlchemy
Aman Gupta,
Data Engineer
Intro
Moving 5 million rows of partially nested JSON with SQLAlchemy takes long enough to grab a coffee because row-by-row serialization is slow. So what happens if we use Arrow end-to-end?
The Test: SQLAlchemy: row-based (no ADBC) vs. Arrow + ADBC (columnar ingestion)
The Result: 3.7x speedup for EL pipelines using the Arrow + ADBC path.
5M Rows from DuckDB to MySQL < 100 Seconds
In our benchmarks, we reduced the 344-second baseline to 92 seconds by switching to the Arrow + ADBC path. Users have documented SQLAlchemy's performance bottlenecks for years. For instance, in (GitHub #9436), the architectural wall is encountered when writing to SQL Server due to driver and protocol overhead. Similarly, this same architectural wall means, switching from Pandas to Polars doesn't speed up read and write operations (r/dataengineering).
To skip these bottlenecks, we further improved dlt’s Arrow path and added ADBC support for MySQL, SQL Server, and SQLite in our December release.
The comparison: row vs. columnar
To see the difference, we tested two paths:
- SQLAlchemy: Standard row-based (no ADBC).
- Arrow + ADBC: Direct columnar ingestion.
| Path | Total Time | Normalize | Load | Speedup |
|---|---|---|---|---|
| SQLAlchemy | 344s | 167s | 132s | 1x |
| Arrow+ADBC | 92s | 0s | 89s | 3.7x |

Our experiment - the benchmark
We ran a side-by-side test moving 5M rows with a mix of flat and nested JSON data.
- Source: DuckDB
- Destination: MySQL
- Matchup: SQLAlchemy (row-by-row) vs. Arrow + ADBC (columnar)
The traditional path: SQLAlchemy
Here’s the overhead. The moment we call to_pylist(), Arrow turns into millions of Python dicts, which causes the row by row serialization cost.
import dlt
import duckdb
@dlt.resource(name="large_table")
def row_by_row_source():
with duckdb.connect("source_data_5m.duckdb", read_only=True) as conn:
query_result = conn.execute("SELECT * FROM large_table")
# Data is inflated into Python objects for the row-by-row path
yield from (batch.to_pylist() for batch in query_result.fetch_record_batch(50_000))
pipeline = dlt.pipeline(
pipeline_name="columnar_shift_benchmark_mysql",
destination="mysql",
dataset_name="benchmark_db"
)
# We enabled 8 parallel workers to give the row-based path every advantage.
pipeline.run(row_by_row_source(), write_disposition="replace")
| Stage | Time |
|---|---|
| Extract | 45.05s |
| Normalize | 166.75s |
| Load | 132.39s |
| Total | 344.21s (~5.67 min) |
The fast lane: Arrow + ADBC
By streaming Arrow batches directly, we eliminate Python object handling costs. We stream Arrow batches “as-is”, preserving the columnar format and letting ADBC bulk load Parquet directly into the destination.
import dlt
import duckdb
@dlt.resource(name="large_table")
def fast_lane_source():
with duckdb.connect("source_data_5m.duckdb", read_only=True) as conn:
result = conn.execute("SELECT * FROM large_table")
yield from result.fetch_record_batch(50_000)
pipeline = dlt.pipeline(
pipeline_name="benchmark_arrow_mysql",
destination="mysql",
dataset_name="benchmark_db_arrow",
)
pipeline.run(
fast_lane_source(),
loader_file_format="parquet",
write_disposition="replace",
)| Stage | Time |
|---|---|
| Extract | 2.91s |
| Normalize | 0.01s |
| Load | 89.32s |
| Total | 92.36s (~1.54 min) |
What the runs tell us
Extraction: 45s -> ~ 3s
By keeping the data in Arrow buffers, we bypass Python object instantiation and serialization overhead. We’ve shifted the bottleneck away from the CPU, allowing the data to flow at the native speed of the source engine.
Normalization: 167s -> ~10ms
Normalization is almost instantaneous because it is now a metadata-only operation. dlt aligns the schema using Arrow metadata instead of iterating through objects to infer types.
Load: 132s -> 89s
We replace row-by-row inserts with ADBC bulk loading.. This moves memory buffers directly into the database engine, bypassing the overhead of traditional communication protocols.
Net result: ~344s down to ~92s, which is a 3.7x speedup end-to-end.
The big win
The real win is architectural simplicity. By the path from source to destination, we’ve reduced the moving parts. This means fewer moving parts, a reduced failure surface, and simplified state management. By doing away with the overhead of transforming columnar data into dict-shaped structures, you not only improve performance but also save on compute.
Why Arrow
Arrow allows us to move structured data without the memory bloat and CPU tax of serialization.
By using its in-memory columnar format, we stop wasting cycles de-serializing, re-typing and re-serializing millions of discrete objects. The bottleneck shifts from the CPU to the underlying transport. By switching to arrow we are removing the overhead of row-based structures, that means lower compute costs and higher throughput.
Why dlt
With dlt, Arrow stops being a special case and becomes the default path. You get the performance of a columnar architecture within the full toolkit for building loading pipelines. It’s a cleaner design with fewer moving parts, and simplified pipeline maintenance.
Try it yourself
Arrow pipelines aren't experimental; they are dlt’s fast lane. Try it yourself easily, by setting loader_file_format="parquet" on supported destinations.
Resources:
- Documentation
- Slack community