Skip to main content
Version: devel View Markdown

Export Langfuse Observability Data

info

The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/langfuse_export

About this Example

Langfuse is an open-source LLM observability and evaluation platform. It captures traces and observations from your AI applications, manages evaluation datasets, and tracks LLM costs — all in a self-hostable Postgres-backed store.

To enable analytics, reporting, and offline evaluation, this data needs to be exported to a data lakehouse or warehouse. Langfuse persists relational metadata in PostgreSQL.

The dlt library's built-in sql_database() source makes extraction straightforward:

  • Connect directly to Langfuse's Postgres backend with a connection string.
  • Use resolve_foreign_keys=True so dlt automatically links child tables (e.g. datasets → projects).
  • Load to DuckDB for zero-config local analytics, or swap to any other dlt destination.

Credentials

Add the following block to .dlt/secrets.toml, filling in the values from your deployment:

# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "postgresql"
host = "localhost"
port = 5432
database = "langfuse"
username = "langfuse"
password = "langfuse"

You also need the psycopg2 driver installed:

pip install psycopg2-binary

Where to find these values:

  • Docker Compose: look for the db service in your docker-compose.yml. The credentials are set via POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB environment variables. The host is the service name (or localhost if the port is published to the host).
  • Helm chart: check values.yaml under postgresql.auth or the DATABASE_URL environment variable on the Langfuse deployment.
  • Langfuse Cloud: direct database access is not available on Langfuse Cloud; use the Langfuse API instead.

Data model

dlt automatically discovers every table in the Langfuse database and infers their relationships from foreign keys — no schema configuration required. The exact set of tables you get depends on which Langfuse features you have used: tables are only populated once the corresponding feature is exercised.

The tables most relevant for LLM observability and evaluation are:

  • organizations — top-level multi-tenant containers; each org owns projects and members.
  • projects — a project scopes all traces, scores, and datasets for one application.
  • trace_sessions — groups related traces into a session (e.g. a multi-turn conversation), tagged with an environment field for staging vs. production separation.
  • datasets — curated collections of examples used for evaluation runs, created via the Langfuse web UI or SDK.
  • dataset_items — individual rows inside a dataset, each carrying an input, optional expected_output, and a link back to the source trace/observation that produced it.
  • score_configs — named score type definitions (numeric or categorical) with optional min_value/max_value bounds; child table score_configs__categories holds the category labels.
  • eval_templates — versioned LLM-as-judge templates with a prompt, model, provider, and typed output schema; child table eval_templates__vars holds the variable names.
  • models — LLM model definitions with input_price, output_price, and tokenizer config used to compute per-trace cost.
  • prices / pricing_tiers — tiered pricing rules linked to model definitions.
  • annotation_queues — human annotation workflows; items are tracked in annotation_queue_items and assignees in annotation_queue_assignments.
  • comments — user comments attached to any Langfuse object (trace, dataset, etc.).
  • dashboards / dashboard_widgets — saved analytics dashboards with chart configuration.
  • audit_logs — full audit trail of create/update/delete actions across the platform.

Some tables are internal to Langfuse (e.g. _prisma_migrations, background_migrations). You can explicitly select a subset of tables to load via sql_database(..., table_names=...).

Full source code

import dlt
from dlt.sources.sql_database import sql_database


@dlt.source
def langfuse_source(credentials=dlt.secrets.value):
return sql_database(
credentials=credentials,
reflection_level="minimal",
resolve_foreign_keys=True,
)


if __name__ == "__main__":
pipeline = dlt.pipeline(
pipeline_name="langfuse",
# can be configure to any dlt destination
destination="duckdb",
)
load_info = pipeline.run(langfuse_source)
print(load_info)

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.