Google Analytics Reporting Python API Docs | dltHub

Build a Google Analytics Reporting-to-database pipeline in Python using dlt with AI Workbench support for Claude Code, Cursor, and Codex.

Last updated:

Google Analytics Data API is a REST API that provides Google Analytics 4 (GA4) reporting data such as custom reports, realtime reports, pivot reports, and metadata for a property. The REST API base URL is https://analyticsdata.googleapis.com and All requests require an OAuth2 Bearer token for authentication.

dlt is an open-source Python library that handles authentication, pagination, and schema evolution automatically. dlthub provides AI context files that enable code assistants to generate production-ready pipelines. Install with uv pip install "dlt[workspace]" and start loading Google Analytics Reporting data in under 10 minutes.


What data can I load from Google Analytics Reporting?

Here are some of the endpoints you can load from Google Analytics Reporting:

ResourceEndpointMethodData selectorDescription
properties_get_metadata/v1/properties/{property}/metadataGETdimensions, metricsRetrieve metadata describing available dimensions and metrics for a property.
properties_audience_exports_list/v1/properties/{property}/audienceExportsGETaudienceExportsList audience export resources for a property.
properties_report_tasks_list/v1alpha/properties/{property}/reportTasksGETreportTasksList long‑running report tasks (v1alpha).
properties_audience_lists_list/v1alpha/properties/{property}/audienceListsGETaudienceListsList audience list resources (v1alpha).
properties_batch_run_reports/v1/properties/{property}:batchRunReportsPOSTreportsRun multiple reports in a single request.
properties_run_report/v1/properties/{property}:runReportPOSTrowsExecute a custom report and receive rows of data.
properties_run_realtime_report/v1/properties/{property}:runRealtimeReportPOSTrowsExecute a realtime report.
properties_run_pivot_report/v1/properties/{property}:runPivotReportPOSTpivotsExecute a pivot report.

How do I authenticate with the Google Analytics Reporting API?

Use OAuth 2.0 to obtain an access token and include it in the request header as Authorization: Bearer <access_token>. Also set Content-Type: application/json.

1. Get your credentials

  1. Open Google Cloud Console and select or create a project.
  2. Enable the "Google Analytics Data API" (analyticsdata.googleapis.com) for that project.
  3. In "APIs & Services" → "Credentials" create an OAuth 2.0 Client ID (or a service account for server‑to‑server access).
  4. If using a service account, grant it permission to the GA4 property and download the JSON key.
  5. Use the appropriate OAuth scope, e.g. https://www.googleapis.com/auth/analytics.readonly, when requesting tokens.

2. Add them to .dlt/secrets.toml

[sources.google_analytics_reporting_source] access_token = "your_oauth_access_token_here"

dlt reads this automatically at runtime — never hardcode tokens in your pipeline script. For production environments, see setting up credentials with dlt for environment variable and vault-based options.


How do I set up and run the pipeline?

Set up a virtual environment and install dlt:

uv venv && source .venv/bin/activate uv pip install "dlt[workspace]"

1. Install the dlt AI Workbench:

dlt ai init --agent <your-agent> # <agent>: claude | cursor | codex

This installs project rules, a secrets management skill, appropriate ignore files, and configures the dlt MCP server for your agent. Learn more →

2. Install the rest-api-pipeline toolkit:

dlt ai toolkit rest-api-pipeline install

This loads the skills and context about dlt the agent uses to build the pipeline iteratively, efficiently, and safely. The agent uses MCP tools to inspect credentials — it never needs to read your secrets.toml directly. Learn more →

3. Start LLM-assisted coding:

Use /find-source to load data from the Google Analytics Reporting API into DuckDB.

The rest-api-pipeline toolkit takes over from here — it reads relevant API documentation, presents you with options for which endpoints to load, and follows a structured workflow to scaffold, debug, and validate the pipeline step by step.

4. Run the pipeline:

python google_analytics_reporting_pipeline.py

If everything is configured correctly, you'll see output like this:

Pipeline google_analytics_reporting_pipeline load step completed in 0.26 seconds 1 load package(s) were loaded to destination duckdb and into dataset google_analytics_reporting_data The duckdb destination used duckdb:/google_analytics_reporting.duckdb location to store data Load package 1749667187.541553 is LOADED and contains no failed jobs

Inspect your pipeline and data:

dlt pipeline google_analytics_reporting_pipeline show

This opens the Pipeline Dashboard where you can verify pipeline state, load metrics, schema (tables, columns, types), and query the loaded data directly.


Python pipeline example

This example loads properties_run_report and properties_get_metadata from the Google Analytics Reporting API into DuckDB. It mirrors the endpoint and data selector configuration from the table above:

import dlt from dlt.sources.rest_api import RESTAPIConfig, rest_api_resources @dlt.source def google_analytics_reporting_source(access_token=dlt.secrets.value): config: RESTAPIConfig = { "client": { "base_url": "https://analyticsdata.googleapis.com", "auth": { "type": "bearer", "token": access_token, }, }, "resources": [ {"name": "properties_run_report", "endpoint": {"path": "v1/properties/{property}:runReport", "data_selector": "rows"}}, {"name": "properties_get_metadata", "endpoint": {"path": "v1/properties/{property}/metadata", "data_selector": "dimensions, metrics"}} ], } yield from rest_api_resources(config) def get_data() -> None: pipeline = dlt.pipeline( pipeline_name="google_analytics_reporting_pipeline", destination="duckdb", dataset_name="google_analytics_reporting_data", ) load_info = pipeline.run(google_analytics_reporting_source()) print(load_info)

To add more endpoints, append entries from the resource table to the "resources" list using the same name, path, and data_selector pattern.


How do I query the loaded data?

Once the pipeline runs, dlt creates one table per resource. You can query with Python or SQL.

Python (pandas DataFrame):

import dlt data = dlt.pipeline("google_analytics_reporting_pipeline").dataset() sessions_df = data.properties_run_report.df() print(sessions_df.head())

SQL (DuckDB example):

SELECT * FROM google_analytics_reporting_data.properties_run_report LIMIT 10;

In a marimo or Jupyter notebook:

import dlt data = dlt.pipeline("google_analytics_reporting_pipeline").dataset() data.properties_run_report.df().head()

See how to explore your data in marimo Notebooks and how to query your data in Python with dataset.


What destinations can I load Google Analytics Reporting data to?

dlt supports loading into any of these destinations — only the destination parameter changes:

DestinationExample value
DuckDB (local, default)"duckdb"
PostgreSQL"postgres"
BigQuery"bigquery"
Snowflake"snowflake"
Redshift"redshift"
Databricks"databricks"
Filesystem (S3, GCS, Azure)"filesystem"

Change the destination in dlt.pipeline(destination="snowflake") and add credentials in .dlt/secrets.toml. See the full destinations list.


Troubleshooting

Authentication failures

If a 401 or 403 response is returned, verify that the OAuth access token is valid, not expired, includes the required scopes (https://www.googleapis.com/auth/analytics.readonly), and that the Google Cloud project has the Analytics Data API enabled. Service accounts must be granted access to the GA4 property.

Quota and rate limits

When the service returns a 429 RESOURCE_EXHAUSTED error, the request has exceeded the allocated quota. Apply exponential backoff and consider requesting higher quotas in the Google Cloud Console. Creating multiple projects to bypass limits violates policy.

Request validation errors

A 400 BAD_REQUEST indicates malformed parameters such as invalid dimension or metric names. Use the properties_get_metadata endpoint to retrieve the list of valid dimensions and metrics before constructing reports.

Pagination and batching quirks

Most runReport and runRealtimeReport calls return all rows in a single rows array. For large result sets, use the batchRunReports endpoint, which returns a reports array where each report contains its own rows. Long‑running exports are accessed via reportTasks or audienceExports (v1alpha) endpoints, which provide pagination tokens and asynchronous processing details.

Ensure that the API key is valid to avoid 401 Unauthorized errors. Also, verify endpoint paths and parameters to avoid 404 Not Found errors.


Next steps

Continue your data engineering journey with the other toolkits of the dltHub AI Workbench:

  • data-exploration — Build custom notebooks, charts, and dashboards for deeper analysis with marimo notebooks.
  • dlthub-runtime — Deploy, schedule, and monitor your pipeline in production.
dlt ai toolkit data-exploration install dlt ai toolkit dlthub-runtime install

Was this page helpful?

Community Hub

Need more dlt context for Google Analytics Reporting?

Request dlt skills, commands, AGENT.md files, and AI-native context.