Skip to main content
Version: devel View Markdown

Export Arize Phoenix Telemetry

info

The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/arize_phoenix_export

About this Example

Arize Phoenix is an open-source observability platform for AI and LLM applications. It captures traces, spans, and evaluations (e.g. relevance scores, hallucination checks) that are sent via the OpenTelemetry (OTel) protocol.

To enable analytics, reporting, and offline evaluation, this telemetry needs to be exported to a data lakehouse or warehouse. Arize Phoenix persists data in a relational database — either PostgreSQL (recommended for production) or SQLite (the default for single-node deployments).

The dlt library's built-in sql_database() source makes extraction straightforward:

  • Connect directly to Phoenix's Postgres or SQLite backend with a connection string.
  • Use resolve_foreign_keys=True so dlt automatically links child tables (e.g. spans → traces).
  • Load to DuckDB for zero-config local analytics, or swap to any other dlt destination.

Credentials

PostgreSQL

Add the following block to .dlt/secrets.toml, filling in the values from your deployment:

# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "postgresql"
host = "localhost"
port = 5432
database = "phoenix"
username = "phoenix"
password = "phoenix"

You also need the psycopg2 driver installed:

pip install psycopg2-binary

Where to find these values:

  • Docker Compose: look for the db service in your docker-compose.yml. The credentials are set via POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB environment variables. The host is the service name (or localhost if the port is published to the host).
  • Helm chart: check values.yaml under postgresql.auth or the PHOENIX_SQL_DATABASE_URL environment variable on the Phoenix deployment.

SQLite

Add the following block to .dlt/secrets.toml, pointing database at the .db file on disk:

# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "sqlite"
database = "/path/to/phoenix.db"

Where to find the file:

  • Docker Compose: the SQLite file lives at /phoenix/storage/phoenix.db inside the container. Volume-mount it to a host path so the pipeline can read it directly:
    volumes:
    - ./data/phoenix.db:/phoenix/storage/phoenix.db
    Then set database to the host-side path (e.g. ./data/phoenix.db).
  • No volume mount: copy the file out first with docker cp <container>:/phoenix/storage/phoenix.db ./phoenix.db, then point database at the copied file.

NOTE. You need the dlt[sql-database] extra installed:

pip install "dlt[sql-database]"

Data model

dlt automatically discovers every table in the Phoenix database and infers their relationships from foreign keys — no schema configuration required. The exact set of tables you get depends on which Phoenix features you have used: tables are only created in the database once the corresponding feature is exercised. For example, the datasets and dataset_examples tables do not exist until you create a dataset through the Phoenix web UI.

The tables most relevant for AI observability are:

  • projects — top-level containers; each project groups a set of traces.
  • traces — a trace is a complete request or workflow, identified by a trace_id.
  • spans — the core table. Every OTEL span belongs to a trace. Spans carry timing, status, LLM token counts, input/output values, and arbitrary OpenInference attributes.
  • span_annotations — evaluations attached to individual spans (e.g. relevance score, hallucination label), produced by human raters or automated LLM judges.
  • trace_annotations — same as span_annotations but scoped to a full trace.
  • datasets — curated collections of examples, created via the Phoenix web UI or SDK.
  • dataset_examples — the individual rows inside a dataset, each linked back to a source span.
  • dataset_versions — snapshot history of a dataset, enabling reproducible evaluations.
  • experiments — prompt playground runs executed against a dataset version.
  • experiment_runs — individual outputs produced by an experiment for each dataset example.

Some tables are internal to Phoenix (e.g. migration tracking, internal job queues). You can explicitly select a subset of tables to load via the parameter sql_database(..., table_names=...).

Full source code

import dlt
from dlt.sources.sql_database import sql_database


@dlt.source
def arize_phoenix_source(credentials=dlt.secrets.value):
"""Returns all Arize Phoenix tables as a dlt source.

Reads directly from Phoenix's relational backend (PostgreSQL or SQLite) using
dlt's built-in `sql_database` source. Foreign key relationships are resolved
automatically so destination tables reference each other correctly.

The primary tables of interest are:

- **spans**: Every OTEL span recorded by Phoenix — the core unit of telemetry.
Each row represents one span with timing, attributes, status, and token counts.
- **traces**: Groups of related spans sharing the same `trace_id`.
- **projects**: Phoenix projects that spans are assigned to.
- **span_annotations**: Human or model-generated evaluations attached to spans
(e.g. relevance score, hallucination label).
- **trace_annotations**: Evaluations at the trace level.
- **datasets / dataset_examples**: Curated examples used in prompt playgrounds
or regression test suites.

Args:
credentials: SQLAlchemy-compatible connection string or credential dict,
loaded from `secrets.toml`. Supports both PostgreSQL and SQLite URIs.

Returns:
A list of dlt resources — one per Phoenix database table.
"""
return sql_database(
credentials=credentials,
resolve_foreign_keys=True,
)


if __name__ == "__main__":
pipeline = dlt.pipeline(
pipeline_name="arize_phoenix",
# can be configure to any dlt destination
destination="duckdb",
)
load_info = pipeline.run(arize_phoenix_source())
print(load_info)

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.