Export Langfuse Observability Data

info

The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/langfuse_export

About this Example

Langfuse is an open-source LLM observability and evaluation platform. It captures traces and observations from your AI applications, manages evaluation datasets, and tracks LLM costs — all in a self-hostable Postgres-backed store.

To enable analytics, reporting, and offline evaluation, this data needs to be exported to a data lakehouse or warehouse. Langfuse persists relational metadata in PostgreSQL.

The dlt library's built-in sql_database() source makes extraction straightforward:

Connect directly to Langfuse's Postgres backend with a connection string.
Use resolve_foreign_keys=True so dlt automatically links child tables (e.g. datasets → projects).
Load to DuckDB for zero-config local analytics, or swap to any other dlt destination.

Credentials

Add the following block to .dlt/secrets.toml, filling in the values from your deployment:

# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "postgresql"
host = "localhost"
port = 5432
database = "langfuse"
username = "langfuse"
password = "langfuse"

You also need the psycopg2 driver installed:

pip install psycopg2-binary

Where to find these values:

Docker Compose: look for the db service in your docker-compose.yml. The credentials are set via POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB environment variables. The host is the service name (or localhost if the port is published to the host).
Helm chart: check values.yaml under postgresql.auth or the DATABASE_URL environment variable on the Langfuse deployment.
Langfuse Cloud: direct database access is not available on Langfuse Cloud; use the Langfuse API instead.

Data model

dlt automatically discovers every table in the Langfuse database and infers their relationships from foreign keys — no schema configuration required. The exact set of tables you get depends on which Langfuse features you have used: tables are only populated once the corresponding feature is exercised.

The tables most relevant for LLM observability and evaluation are:

organizations — top-level multi-tenant containers; each org owns projects and members.
projects — a project scopes all traces, scores, and datasets for one application.
trace_sessions — groups related traces into a session (e.g. a multi-turn conversation), tagged with an environment field for staging vs. production separation.
datasets — curated collections of examples used for evaluation runs, created via the Langfuse web UI or SDK.
dataset_items — individual rows inside a dataset, each carrying an input, optional expected_output, and a link back to the source trace/observation that produced it.
score_configs — named score type definitions (numeric or categorical) with optional min_value/max_value bounds; child table score_configs__categories holds the category labels.
eval_templates — versioned LLM-as-judge templates with a prompt, model, provider, and typed output schema; child table eval_templates__vars holds the variable names.
models — LLM model definitions with input_price, output_price, and tokenizer config used to compute per-trace cost.
prices / pricing_tiers — tiered pricing rules linked to model definitions.
annotation_queues — human annotation workflows; items are tracked in annotation_queue_items and assignees in annotation_queue_assignments.
comments — user comments attached to any Langfuse object (trace, dataset, etc.).
dashboards / dashboard_widgets — saved analytics dashboards with chart configuration.
audit_logs — full audit trail of create/update/delete actions across the platform.

Some tables are internal to Langfuse (e.g. _prisma_migrations, background_migrations). You can explicitly select a subset of tables to load via sql_database(..., table_names=...).

Full source code

import dlt
from dlt.sources.sql_database import sql_database


@dlt.source
def langfuse_source(credentials=dlt.secrets.value):
    return sql_database(
        credentials=credentials,
        reflection_level="minimal",
        resolve_foreign_keys=True,
    )


if __name__ == "__main__":
    pipeline = dlt.pipeline(
        pipeline_name="langfuse",
        # can be configure to any dlt destination
        destination="duckdb",
    )
    load_info = pipeline.run(langfuse_source)
    print(load_info)

Export Langfuse Observability Data

About this Example

Credentials

Data model

Full source code

DHelp

Ask a question

About this Example​

Credentials​

Data model​

Full source code​

DHelp

Ask a question

About this Example

Credentials

Data model

Full source code