# dltHub Toolkits Cheatsheet

Synced from [dlt-hub/dlthub-ai-workbench](https://github.com/dlt-hub/dlthub-ai-workbench) on branch `master`.

Source tree: [workbench/](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench)

## rest-api-pipeline

One shot REST API data pipelines with dlt (data load tool). Find sources, scaffold pipelines, debug, validate data, add endpoints, configure incremental loading, query data, and create interactive reports.

Source: [`rest-api-pipeline`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/rest-api-pipeline)
Version: `0.1.0`
Category: `data`
Listed: yes
Dependencies: `init`
Workflow entry skill: `/find-source`
Tags: `dlt`, `etl`, `data-pipeline`, `python`

### Skills

- [`/adjust-endpoint`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/adjust-endpoint/SKILL.md) — Adjust a working dlt pipeline for production — remove dev limits, verify pagination, configure incremental loading, expand date ranges. Use when the user wants to remove .add_limit(), load more data, fix pagination, or set up incremental loading.
- [`/create-rest-api-pipeline`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/create-rest-api-pipeline/SKILL.md) — Create a dlt REST API pipeline. Use for the rest_api core source, or any generic REST/HTTP API source. Not for sql_database or filesystem sources.
- [`/debug-pipeline`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/debug-pipeline/SKILL.md) — Debug and inspect a dlt pipeline after running it. Use after a pipeline run (success or failure) to inspect traces, load packages, schema, data, and diagnose errors like missing credentials or failed jobs.
- [`/find-source`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/find-source/SKILL.md) — Find a dlt source for a given API or data provider. Use when the user asks about a source, wants to find a connector, or asks to implement a pipeline for a specific data source.
- [`/new-endpoint`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/new-endpoint/SKILL.md) — Add a new REST API endpoint/resource to an existing dlt pipeline. Use when the user wants to pull additional data from an API that already has a working pipeline.
- [`/validate-data`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/validate-data/SKILL.md) — Validate schema and data after a successful dlt pipeline load. Use when the user wants to check if loaded data looks correct, inspect table schemas, fix data types, flatten nested structures, or refine the data shape.
- [`/view-data`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/skills/view-data/SKILL.md) — Query, explore, or view data loaded by a dlt pipeline. Use when the user asks to query data, explore loaded tables, check row counts, write Python that reads pipeline data, or asks questions like "show me the data", "what users are there", "how much did we spend". Covers dlt dataset API, ibis expressions, and ReadableRelation.

### Rules

- [`New ingestion pipeline`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/rest-api-pipeline/rules/workflow.md)

### MCP Servers

- dlt-workspace-mcp — `uv run dlt ai mcp --stdio`

## bootstrap

Prepare Python environment for dlthub workspace

Source: [`bootstrap`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/bootstrap)
Version: `0.1.0`
Category: `data`
Listed: no
Dependencies: `init`
Tags: `dlt`, `workspace`, `setup`

### Commands

- [`init-workspace`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/bootstrap/commands/init-workspace.md) — Sets up dlthub workspace. Ensures `uv`, Python env and dlt are present. Installs LLM toolkit to kickstart future work.

## data-exploration

Quick insights from dlt pipeline data. Connect to a pipeline, profile tables, plan charts, and assemble marimo dashboards.

Source: [`data-exploration`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/data-exploration)
Version: `0.1.0`
Category: `data`
Listed: yes
Dependencies: `init`
Workflow entry skill: `/explore-data`
Tags: `dlt`, `marimo`, `ibis`, `data-exploration`, `dashboards`, `altair`

### Skills

- [`/build-notebook`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-exploration/skills/build-notebook/SKILL.md) — This skill should be used when the user asks to "build the notebook", "launch the dashboard", "generate the marimo notebook", or when an analysis_plan.md artifact exists and the user wants to assemble or regenerate the dashboard. Reads chart specs with ibis queries and altair code from analysis_plan.md, assembles a marimo Python file, validates, and launches. Do NOT use for exploring data or planning charts (use explore-data), building pipelines (use rest-api-pipeline toolkit), or deploying (use dlthub-runtime toolkit).
- [`/explore-data`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-exploration/skills/explore-data/SKILL.md) — This skill should be used when the user asks to "explore my data", "what can I learn from this pipeline", "what's the revenue trend", "show me charts", "visualize my pipeline", "analyze my data", "profile data quality", "what questions can I ask about my data", "map my data to business concepts", or wants to explore, profile, analyze, or chart data from a dlt pipeline. Connects to a pipeline, profiles tables or scans schema, plans charts with ibis + altair code, and writes an analysis_plan.md artifact. Do NOT use for building or fixing pipelines (use rest-api-pipeline toolkit), deploying pipelines (use dlthub-runtime toolkit), or assembling the marimo notebook from an analysis plan (use build-notebook).

### Rules

- [`Data exploration workflow`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-exploration/rules/workflow.md)

## dlthub-runtime

Deploy dlt workspace and pipelines to the dltHub platform

Source: [`dlthub-runtime`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/dlthub-runtime)
Version: `0.1.2`
Category: `data`
Listed: yes
Dependencies: `init`
Workflow entry skill: `/setup-runtime`
Tags: `dlt`, `deploy`, `dlthub`, `platform`

### Skills

- [`/debug-deployment`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/skills/debug-deployment/SKILL.md) — Debug a failed or misbehaving dltHub Runtime deployment. Use when a runtime job fails, produces unexpected results, or the user wants to check job status and logs.
- [`/deploy-workspace`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/skills/deploy-workspace/SKILL.md) — Deploy dlt pipelines to dltHub Platform. Use when the user says "deploy to dltHub", "launch on dltHub", "run on dltHub", "schedule pipeline", or wants to deploy a pipeline or notebook to dltHub.
- [`/prepare-deployment`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/skills/prepare-deployment/SKILL.md) — Prepare production credentials and destinations for dltHub Runtime. Use when setting up prod profile secrets, splitting dev/prod credentials, or configuring a production destination like Motherduck.
- [`/setup-runtime`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/skills/setup-runtime/SKILL.md) — Verify dlt workspace is ready for dltHub Runtime. Use when user wants to deploy for the first time, or when another skill reports missing prerequisites like .workspace file or dlt[hub] dependency.

### Rules

- [`Profiles`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/rules/profiles.md)
- [`Deploy to dltHub Runtime`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/dlthub-runtime/rules/workflow.md)

## init

Shared rules, secrets handling, and workspace MCP for dlt

Source: [`init`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/init)
Version: `0.1.0`
Category: `data`
Listed: yes
Tags: `dlt`, `init`, `rules`, `secrets`, `mcp`

### Skills

- [`/improve-skills`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/init/skills/improve-skills/SKILL.md) — Improve existing skills based on the current session. Use at the end of a session (or when the user asks) to capture new debugging patterns, data issues, data validation tracks, querying techniques, doc references, or workflow improvements learned during the session. Keeps skills lean and personalized.
- [`/setup-secrets`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/init/skills/setup-secrets/SKILL.md) — Safely manage dlt secrets in *.secrets.toml. Use when the user directly asks to set up, configure, or inspect credentials (API keys, database passwords, tokens). Also use when writing Python code that needs to read secrets via dlt.secrets without exposing values. Do NOT use for pipeline creation, source discovery, or debugging pipeline execution — those skills call setup-secrets when they need credentials configured.
- [`/toolkit-dispatch`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/init/skills/toolkit-dispatch/SKILL.md) — Helps users figure out what they can build with dlt and which workflow to start. MUST use this skill when the user asks questions like 'what can you do', 'how do I build a pipeline', 'how do I make reports', 'how do I deploy', 'what are toolkits', 'what's available', 'I'm new to dlt', 'where do I start', or seems confused about what to do next after initial setup. Also use when the user asks broad capability questions about data engineering with dlt. Do NOT use when the user has a specific task in progress like debugging a pipeline, validating data, or adding endpoints.

### Rules

- [`setup`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/init/rules/dlthub-workspace.md)

### MCP Servers

- dlt-workspace-mcp — `uv run dlt ai mcp --stdio`

## transformations

Transform raw dlt pipeline data into a Canonical Data Model. Annotate sources, build an ontology, design a CDM with Kimball dimensional modeling, and write @dlt.hub.transformation functions.

Source: [`transformations`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/transformations)
Version: `0.1.0`
Category: `data`
Listed: yes
Dependencies: `init`
Workflow entry skill: `/annotate-sources`
Tags: `dlt`, `transformation`, `cdm`, `kimball`, `ibis`, `dimensional-modeling`

### Skills

- [`/annotate-sources`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/transformations/skills/annotate-sources/SKILL.md) — Annotate dlt pipeline sources for transformation. Use when the user wants to transform data, do data modelling, design a data model, describes their data sources and use cases, or wants to build a CDM from existing pipelines.
- [`/create-ontology`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/transformations/skills/create-ontology/SKILL.md) — Build a business entity graph (ontology) from annotated sources and taxonomy. Use after annotate-sources to design the entity model before CDM generation.
- [`/create-transformation`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/transformations/skills/create-transformation/SKILL.md) — Write dlt transformation functions that map source tables to CDM entities. Use after generate-cdm to produce the transformation Python script.
- [`/generate-cdm`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/transformations/skills/generate-cdm/SKILL.md) — Generate a Canonical Data Model (CDM) in DBML using Kimball dimensional modeling. Use after create-ontology to produce the implementation-ready CDM schema.

### Rules

- [`Transformations workflow`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/transformations/rules/workflow.md)

### MCP Servers

- dlt-workspace-mcp — `uv run dlt ai mcp --stdio`

## data-quality

Add checks and metrics to dlt pipelines — inspect schema for candidates, define column-level validations and load metrics, run them on every pipeline load, and review results with failure diagnosis.

Source: [`data-quality`](https://github.com/dlt-hub/dlthub-ai-workbench/tree/master/workbench/data-quality)
Version: `0.1.0`
Category: `data`
Listed: yes
Dependencies: `init`
Workflow entry skill: `/setup-data-quality`
Tags: `dlt`, `data-quality`, `validation`, `checks`, `metrics`

### Skills

- [`/define-data-quality-checks`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/skills/define-data-quality-checks/SKILL.md) — Use when the user asks to "define checks", "add validation rules", "what checks should I add", "translate requirements into checks", or wants to map schema hints or business rules to dlt data quality check and metric calls for a specific pipeline or table. Do NOT use to run checks (use run-data-quality) or to set up the pipeline environment (use setup-data-quality).
- [`/review-data-quality`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/skills/review-data-quality/SKILL.md) — Use when the user asks to "review data quality results", "what failed", "show me data quality results", "analyze check results", "investigate data quality failures", or wants to understand check and metric outcomes from a pipeline run. Do NOT use to run new checks (use run-data-quality).
- [`/run-data-quality`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/skills/run-data-quality/SKILL.md) — Use when the user asks to "run data quality checks", "execute checks", "run my data quality checks", "check the data now", "run validations", or wants to execute already-defined checks against a loaded pipeline. Do NOT use to define new checks (use define-data-quality-checks) or to review existing results (use review-data-quality).
- [`/setup-data-quality`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/skills/setup-data-quality/SKILL.md) — Use when the user asks to "set up data quality", "enable data quality checks", "add data quality to my pipeline", "validate my pipeline data", "I want to check data quality", "check my tables for issues", or wants to start any data quality workflow on a dlt pipeline. Do NOT use for exploring or charting data (use data-exploration toolkit), running existing checks (use run-data-quality), or reviewing results (use review-data-quality).

### Rules

- [`Data quality conventions`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/rules/dq-rules.md)
- [`Data quality workflow`](https://github.com/dlt-hub/dlthub-ai-workbench/blob/master/workbench/data-quality/rules/workflow.md)
