Export Arize Phoenix Telemetry
The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/arize_phoenix_export
About this Example
Arize Phoenix is an open-source observability platform for AI and LLM applications. It captures traces, spans, and evaluations (e.g. relevance scores, hallucination checks) that are sent via the OpenTelemetry (OTel) protocol.
To enable analytics, reporting, and offline evaluation, this telemetry needs to be exported to a data lakehouse or warehouse. Arize Phoenix persists data in a relational database — either PostgreSQL (recommended for production) or SQLite (the default for single-node deployments).
The dlt library's built-in sql_database() source makes extraction straightforward:
- Connect directly to Phoenix's Postgres or SQLite backend with a connection string.
- Use
resolve_foreign_keys=Trueso dlt automatically links child tables (e.g. spans → traces). - Load to DuckDB for zero-config local analytics, or swap to any other dlt destination.
Credentials
PostgreSQL
Add the following block to .dlt/secrets.toml, filling in the values from your deployment:
# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "postgresql"
host = "localhost"
port = 5432
database = "phoenix"
username = "phoenix"
password = "phoenix"
You also need the psycopg2 driver installed:
pip install psycopg2-binary
Where to find these values:
- Docker Compose: look for the
dbservice in yourdocker-compose.yml. The credentials are set viaPOSTGRES_USER,POSTGRES_PASSWORD, andPOSTGRES_DBenvironment variables. The host is the service name (orlocalhostif the port is published to the host). - Helm chart: check
values.yamlunderpostgresql.author thePHOENIX_SQL_DATABASE_URLenvironment variable on the Phoenix deployment.
SQLite
Add the following block to .dlt/secrets.toml, pointing database at the .db file on disk:
# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "sqlite"
database = "/path/to/phoenix.db"
Where to find the file:
- Docker Compose: the SQLite file lives at
/phoenix/storage/phoenix.dbinside the container. Volume-mount it to a host path so the pipeline can read it directly:Then setvolumes:
- ./data/phoenix.db:/phoenix/storage/phoenix.dbdatabaseto the host-side path (e.g../data/phoenix.db). - No volume mount: copy the file out first with
docker cp <container>:/phoenix/storage/phoenix.db ./phoenix.db, then pointdatabaseat the copied file.
NOTE. You need the dlt[sql-database] extra installed:
pip install "dlt[sql-database]"
Data model
dlt automatically discovers every table in the Phoenix database and infers their relationships from
foreign keys — no schema configuration required. The exact set of tables you get depends on which
Phoenix features you have used: tables are only created in the database once the corresponding
feature is exercised. For example, the datasets and dataset_examples tables do not exist until
you create a dataset through the Phoenix web UI.
The tables most relevant for AI observability are:
projects— top-level containers; each project groups a set of traces.traces— a trace is a complete request or workflow, identified by atrace_id.spans— the core table. Every OTEL span belongs to a trace. Spans carry timing, status, LLM token counts, input/output values, and arbitrary OpenInference attributes.span_annotations— evaluations attached to individual spans (e.g. relevance score, hallucination label), produced by human raters or automated LLM judges.trace_annotations— same asspan_annotationsbut scoped to a full trace.datasets— curated collections of examples, created via the Phoenix web UI or SDK.dataset_examples— the individual rows inside a dataset, each linked back to a source span.dataset_versions— snapshot history of a dataset, enabling reproducible evaluations.experiments— prompt playground runs executed against a dataset version.experiment_runs— individual outputs produced by an experiment for each dataset example.
Some tables are internal to Phoenix (e.g. migration tracking, internal job queues).
You can explicitly select a subset of tables to load via the parameter sql_database(..., table_names=...).
Full source code
import dlt
from dlt.sources.sql_database import sql_database
@dlt.source
def arize_phoenix_source(credentials=dlt.secrets.value):
"""Returns all Arize Phoenix tables as a dlt source.
Reads directly from Phoenix's relational backend (PostgreSQL or SQLite) using
dlt's built-in `sql_database` source. Foreign key relationships are resolved
automatically so destination tables reference each other correctly.
The primary tables of interest are:
- **spans**: Every OTEL span recorded by Phoenix — the core unit of telemetry.
Each row represents one span with timing, attributes, status, and token counts.
- **traces**: Groups of related spans sharing the same `trace_id`.
- **projects**: Phoenix projects that spans are assigned to.
- **span_annotations**: Human or model-generated evaluations attached to spans
(e.g. relevance score, hallucination label).
- **trace_annotations**: Evaluations at the trace level.
- **datasets / dataset_examples**: Curated examples used in prompt playgrounds
or regression test suites.
Args:
credentials: SQLAlchemy-compatible connection string or credential dict,
loaded from `secrets.toml`. Supports both PostgreSQL and SQLite URIs.
Returns:
A list of dlt resources — one per Phoenix database table.
"""
return sql_database(
credentials=credentials,
resolve_foreign_keys=True,
)
if __name__ == "__main__":
pipeline = dlt.pipeline(
pipeline_name="arize_phoenix",
# can be configure to any dlt destination
destination="duckdb",
)
load_info = pipeline.run(arize_phoenix_source())
print(load_info)