GitBook Python API Docs | dltHub

Build a GitBook-to-database pipeline in Python using dlt with AI Workbench support for Claude Code, Cursor, and Codex.

Last updated:

GitBook API is a RESTful API that supports standard HTTP methods like GET, POST, PATCH, and DELETE, allowing interaction with GitBook content. The REST API base URL is https://api.gitbook.com/v1 and All requests require a Bearer token for authentication..

dlt is an open-source Python library that handles authentication, pagination, and schema evolution automatically. dlthub provides AI context files that enable code assistants to generate production-ready pipelines. Install with uv pip install "dlt[workspace]" and start loading GitBook data in under 10 minutes.


What data can I load from GitBook?

Here are some of the endpoints you can load from GitBook:

ResourceEndpointMethodData selectorDescription
orgs/v1/orgsGETRetrieve organizations
organization_members/v1/orgs/{organizationId}/membersGETitemsList all organization members
member_teams/v1/orgs/{organizationId}/members/{userId}/teamsGETitemsList teams for a member
spaces/v1/spacesGETitemsList all spaces
spaces_by_org/v1/orgs/{organizationId}/spacesGETitemsList spaces for an organization
collections/v1/spaces/{spaceId}/collectionsGETitemsList collections within a space
pages/v1/spaces/{spaceId}/pagesGETitemsList pages within a space
sites/v1/orgs/{organizationId}/sitesGETitemsList sites for an organization
sites_all/v1/sitesGETitemsList all sites
integrations/v1/integrationsGETitemsList integrations

How do I authenticate with the GitBook API?

Authentication uses Bearer tokens. All requests require an Authorization header in the format Authorization: Bearer YOUR_SECRET_TOKEN.

1. Get your credentials

To obtain API credentials, create a Developer (personal access) token from GitBook developer settings or create an Integration in the GitBook dashboard, then copy the generated token.

2. Add them to .dlt/secrets.toml

[sources.gitbook_api_source] api_token = "YOUR_TOKEN"

dlt reads this automatically at runtime — never hardcode tokens in your pipeline script. For production environments, see setting up credentials with dlt for environment variable and vault-based options.


How do I set up and run the pipeline?

Set up a virtual environment and install dlt:

uv venv && source .venv/bin/activate uv pip install "dlt[workspace]"

1. Install the dlt AI Workbench:

dlt ai init --agent <your-agent> # <agent>: claude | cursor | codex

This installs project rules, a secrets management skill, appropriate ignore files, and configures the dlt MCP server for your agent. Learn more →

2. Install the rest-api-pipeline toolkit:

dlt ai toolkit rest-api-pipeline install

This loads the skills and context about dlt the agent uses to build the pipeline iteratively, efficiently, and safely. The agent uses MCP tools to inspect credentials — it never needs to read your secrets.toml directly. Learn more →

3. Start LLM-assisted coding:

Use /find-source to load data from the GitBook API into DuckDB.

The rest-api-pipeline toolkit takes over from here — it reads relevant API documentation, presents you with options for which endpoints to load, and follows a structured workflow to scaffold, debug, and validate the pipeline step by step.

4. Run the pipeline:

python gitbook_api_pipeline.py

If everything is configured correctly, you'll see output like this:

Pipeline gitbook_api_pipeline load step completed in 0.26 seconds 1 load package(s) were loaded to destination duckdb and into dataset gitbook_api_data The duckdb destination used duckdb:/gitbook_api.duckdb location to store data Load package 1749667187.541553 is LOADED and contains no failed jobs

Inspect your pipeline and data:

dlt pipeline gitbook_api_pipeline show

This opens the Pipeline Dashboard where you can verify pipeline state, load metrics, schema (tables, columns, types), and query the loaded data directly.


Python pipeline example

This example loads organization_members and spaces from the GitBook API into DuckDB. It mirrors the endpoint and data selector configuration from the table above:

import dlt from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources @dlt.source def gitbook_api_source(api_token=dlt.secrets.value): config: RESTAPIConfig = { "client": { "base_url": "https://api.gitbook.com/v1", "auth": { "type": "bearer", "token": api_token, }, }, "resources": [ {"name": "organization_members", "endpoint": {"path": "orgs/{organizationId}/members", "data_selector": "items"}}, {"name": "spaces", "endpoint": {"path": "orgs/{organizationId}/spaces", "data_selector": "items"}} ], } yield from rest_api_resources(config) def get_data() -> None: pipeline = dlt.pipeline( pipeline_name="gitbook_api_pipeline", destination="duckdb", dataset_name="gitbook_api_data", ) load_info = pipeline.run(gitbook_api_source()) print(load_info)

To add more endpoints, append entries from the resource table to the "resources" list using the same name, path, and data_selector pattern.


How do I query the loaded data?

Once the pipeline runs, dlt creates one table per resource. You can query with Python or SQL.

Python (pandas DataFrame):

import dlt data = dlt.pipeline("gitbook_api_pipeline").dataset() sessions_df = data.organization_members.df() print(sessions_df.head())

SQL (DuckDB example):

SELECT * FROM gitbook_api_data.organization_members LIMIT 10;

In a marimo or Jupyter notebook:

import dlt data = dlt.pipeline("gitbook_api_pipeline").dataset() data.organization_members.df().head()

See how to explore your data in marimo Notebooks and how to query your data in Python with dataset.


What destinations can I load GitBook data to?

dlt supports loading into any of these destinations — only the destination parameter changes:

DestinationExample value
DuckDB (local, default)"duckdb"
PostgreSQL"postgres"
BigQuery"bigquery"
Snowflake"snowflake"
Redshift"redshift"
Databricks"databricks"
Filesystem (S3, GCS, Azure)"filesystem"

Change the destination in dlt.pipeline(destination="snowflake") and add credentials in .dlt/secrets.toml. See the full destinations list.


Troubleshooting

Authentication Failures

If you encounter 401 Unauthorized or 403 Forbidden errors, ensure your Bearer token is correctly included in the Authorization header and that it has the necessary scope for the requested resource. 401 Unauthorized typically indicates a missing or invalid token, while 403 Forbidden suggests insufficient permissions.

Rate Limiting

Requests may be subject to rate limits, resulting in 429 Rate limit errors. If you receive this error, you are sending too many requests in a given timeframe. Implement exponential backoff or reduce your request frequency.

Resource Not Found

A 404 Not Found error indicates that the requested resource (e.g., an organization, space, or page) does not exist or the ID provided is incorrect. Verify the IDs used in your requests.

Invalid Requests

400 Bad Request errors occur when the request parameters are invalid or malformed. Review the API documentation for the specific endpoint to ensure all required parameters are correctly formatted and provided.

Server Errors

500 Server Error indicates an issue on the GitBook API side. These are typically transient; retrying the request after a short delay may resolve the issue. If the problem persists, contact GitBook support.

Ensure that the API key is valid to avoid 401 Unauthorized errors. Also, verify endpoint paths and parameters to avoid 404 Not Found errors.


Next steps

Continue your data engineering journey with the other toolkits of the dltHub AI Workbench:

  • data-exploration — Build custom notebooks, charts, and dashboards for deeper analysis with marimo notebooks.
  • dlthub-runtime — Deploy, schedule, and monitor your pipeline in production.
dlt ai toolkit data-exploration install dlt ai toolkit dlthub-runtime install

Was this page helpful?

Community Hub

Need more dlt context for GitBook?

Request dlt skills, commands, AGENT.md files, and AI-native context.