Export Langfuse Observability Data
The source code for this example can be found in our repository at: https://github.com/dlt-hub/dlt/tree/devel/docs/examples/langfuse_export
About this Example
Langfuse is an open-source LLM observability and evaluation platform. It captures traces and observations from your AI applications, manages evaluation datasets, and tracks LLM costs — all in a self-hostable Postgres-backed store.
To enable analytics, reporting, and offline evaluation, this data needs to be exported to a data lakehouse or warehouse. Langfuse persists relational metadata in PostgreSQL.
The dlt library's built-in sql_database() source makes extraction straightforward:
- Connect directly to Langfuse's Postgres backend with a connection string.
- Use
resolve_foreign_keys=Trueso dlt automatically links child tables (e.g. datasets → projects). - Load to DuckDB for zero-config local analytics, or swap to any other dlt destination.
Credentials
Add the following block to .dlt/secrets.toml, filling in the values from your deployment:
# .dlt/secrets.toml
[sources.sql_database.credentials]
drivername = "postgresql"
host = "localhost"
port = 5432
database = "langfuse"
username = "langfuse"
password = "langfuse"
You also need the psycopg2 driver installed:
pip install psycopg2-binary
Where to find these values:
- Docker Compose: look for the
dbservice in yourdocker-compose.yml. The credentials are set viaPOSTGRES_USER,POSTGRES_PASSWORD, andPOSTGRES_DBenvironment variables. The host is the service name (orlocalhostif the port is published to the host). - Helm chart: check
values.yamlunderpostgresql.author theDATABASE_URLenvironment variable on the Langfuse deployment. - Langfuse Cloud: direct database access is not available on Langfuse Cloud; use the Langfuse API instead.
Data model
dlt automatically discovers every table in the Langfuse database and infers their relationships from foreign keys — no schema configuration required. The exact set of tables you get depends on which Langfuse features you have used: tables are only populated once the corresponding feature is exercised.
The tables most relevant for LLM observability and evaluation are:
organizations— top-level multi-tenant containers; each org owns projects and members.projects— a project scopes all traces, scores, and datasets for one application.trace_sessions— groups related traces into a session (e.g. a multi-turn conversation), tagged with anenvironmentfield for staging vs. production separation.datasets— curated collections of examples used for evaluation runs, created via the Langfuse web UI or SDK.dataset_items— individual rows inside a dataset, each carrying aninput, optionalexpected_output, and a link back to the source trace/observation that produced it.score_configs— named score type definitions (numeric or categorical) with optionalmin_value/max_valuebounds; child tablescore_configs__categoriesholds the category labels.eval_templates— versioned LLM-as-judge templates with aprompt,model,provider, and typed output schema; child tableeval_templates__varsholds the variable names.models— LLM model definitions withinput_price,output_price, and tokenizer config used to compute per-trace cost.prices/pricing_tiers— tiered pricing rules linked to model definitions.annotation_queues— human annotation workflows; items are tracked inannotation_queue_itemsand assignees inannotation_queue_assignments.comments— user comments attached to any Langfuse object (trace, dataset, etc.).dashboards/dashboard_widgets— saved analytics dashboards with chart configuration.audit_logs— full audit trail of create/update/delete actions across the platform.
Some tables are internal to Langfuse (e.g. _prisma_migrations, background_migrations).
You can explicitly select a subset of tables to load via sql_database(..., table_names=...).
Full source code
import dlt
from dlt.sources.sql_database import sql_database
@dlt.source
def langfuse_source(credentials=dlt.secrets.value):
return sql_database(
credentials=credentials,
reflection_level="minimal",
resolve_foreign_keys=True,
)
if __name__ == "__main__":
pipeline = dlt.pipeline(
pipeline_name="langfuse",
# can be configure to any dlt destination
destination="duckdb",
)
load_info = pipeline.run(langfuse_source)
print(load_info)