dltHub
Blog /

Build 10x Faster in dltHub’s LLM-Native Workspace (manual coding optional)

  • Adrian Brudaru,
    Co-Founder & CDO

The traditional process of building production-ready data pipelines is slow, tedious, and filled with repetitive boilerplate code for handling API documentation, authentication, pagination, and schema mapping.

Well, we are automating that. Watch the video here or read on.

Meet the dltHub Workspace Workflow a frictionless, LLM-native approach designed to help data developers build, run, and analyze complete pipelines faster and with far less effort.

In this demonstration, we show you how to execute the first three critical stages of the workflow, moving from an empty directory to actionable insights using the GitHub API, all without writing a single line of ingestion code.

Step 1: Load Data with LLM-Native Scaffolds

The goal of this stage is to generate a fully configured, production-ready data ingestion pipeline from a REST API source in minutes.

Step 1: Load data - LLM scaffolds

The Problem of Boilerplate

LLMs often fail at general code generation because the problem space is too open. dlt solves this by providing a config-driven approach that confines the LLM to a well-defined domain (API parameters).

The Solution: dlt init & LLMs

  1. Select a Scaffold: Browse over 5,900 LLM-native scaffolds in the dltHub workspace (e.g., GitHub, Zendesk, Stripe) to get a source template.
  2. Generate the Project: Use a single command: dlt init dlthub <source> <destination> (e.g., dlt init dlthub github duckdb). This command not only sets up the project files but, critically, adds two things:
    • LLM Rules (.cursor-rules): Supplementary instructions that guide the LLM agent (like Cursor or Continue) to build the dlt pipeline correctly.
    • API Documentation YAML: A structured, LLM-readable file containing all the necessary API documentation (e.g., github_docs.yaml).
  3. Prompt the LLM: Instead of writing Python, you write a natural language prompt (e.g., "Collect commits and contributors for dltHub/dlt repo"). The LLM uses the embedded rules and documentation to populate the placeholder configurations in your script.
  4. Run & Debug with AI: If the initial run fails (due to the non-deterministic nature of LLMs), you simply feed the error log back to the agent. Because the LLM is pre-armed with context on dlt pipeline debugging (especially for issues like pagination), it can debug and fix the error instantly, without you writing or manually inspecting the code.
Key Benefit: Build, configure, and debug complex data ingestion from any REST API without writing a single line of custom Python logic.

Step 2: Validate Data with the dlt Dashboard

Once the data is loaded, ensuring its quality is paramount. The dlt Dashboard is your instant, built-in tool for validating the loaded data and inspecting the pipeline's status.

  • Launch Command: Simply run dlt pipeline <pipeline_name> show in your terminal.
  • Comprehensive Overview: Instantly view pipeline metadata, last execution date, and destination type.
  • Schema Inspection: Easily review the schema of all loaded tables (including child tables created by dlt from nested JSON responses).
  • Data Browser: Use the built-in SQL query interface to run simple queries and preview the raw data inside your destination (DuckDB, Snowflake, etc.).
Key Benefit: Validate data quality, review transformations, and inspect metadata immediately after a load, all within a browser-based app launched from the CLI.

Step 3: Transform & Analyze with Marimo and IBIS

The final local step is to connect to your newly loaded data, perform transformations, and extract insights. This is powered by a reactive Python environment.

  • Reactive Notebooks (marimo): We use marimo notebooks because they are purely Python code, making them easy to version, share, and integrate into a standard engineering workflow (unlike traditional notebooks).
  • Database Abstraction (IBIS): The IBIS library provides a Python abstraction layer over your database.
    • Flexible Querying: Connect to the pipeline using the same configuration used in Stage 1.
    • Dual-Language Support: Write your data transformations and analysis using either pure SQL or Python syntax. IBIS translates your commands for the underlying database (DuckDB, BigQuery, Snowflake, etc.), allowing developers to play to their strengths.
    • Visualization: Utilize powerful libraries like Altair for interactive data visualization directly within the reactive Marimo environment to create charts (e.g., commits per month, contributions per developer).
Key Benefit: Rapidly connect to your data, leverage flexible SQL/Python querying via IBIS, and create shareable, reactive reports.

A Real Example: GitHub Commits Analysis

To demonstrate, we built a complete data pipeline to answer a simple question: "How many commits are being made in our dlt repository?"

Here's what we accomplished:

Step 1: Generate the Pipeline

  • Created a dlt project with one command: dlt init dltHub github duckdb
  • Wrote a simple prompt describing the data we needed (commits endpoint, contributors endpoint, specific repository)
  • An LLM agent generated the entire working pipeline—complete with proper API configuration, error handling, and pagination

Step 2: Handle Errors Intelligently

  • The pipeline initially failed due to a pagination configuration issue
  • Instead of manually debugging, we provided the error message to our LLM agent
  • It identified the problem, suggested a fix, and we ran the pipeline again—success

Step 3: Validate the Data

  • Opened the dlt Dashboard to inspect schemas
  • Verified that commits and contributors tables loaded correctly
  • Confirmed data quality metrics

Step 4: Build Interactive Reports

  • Connected to the data via IBIS in a Marimo notebook
  • Created a line chart showing commits per month
  • Created a bar chart showing contributions by developer
  • All with just a few lines of code—no boilerplate required

The entire workflow, from pipeline creation to interactive analysis, took minutes instead of days.

Ready to Eliminate Boilerplate and Move Faster?

The dltHub Workspace Workflow is the modern way to build robust, maintainable data pipelines using the best of LLMs and battle-tested open-source frameworks.

🎯 Who is this for? This workflow is specifically designed for data developers and engineers who are tired of writing custom ingestion scripts, dealing with messy API documentation, and debugging fragile code.

Watch the full demonstration

LLM native workflow docs

Explore 8800+ LLM native scaffolds on dltHub workspace

Already a dlt user?

Boost your productivity with our new pip install “dlt[workspace]” and unlock LLM-native dlt pipeline development for over 8,800+ REST API data sources.