Stripe
Join our Slack community or Get in touch with the dltHub Customer Success team.
Stripe is an online payment platform that allows businesses to securely process and manage customer transactions over the Internet.
This Stripe dlt verified source and
pipeline example
loads data using the Stripe API to the destination of your choice.
This verified source loads data from the following endpoints:
| Name | Description |
|---|---|
| Subscription | Recurring payment on Stripe |
| Account | User profile on Stripe |
| Coupon | Discount codes offered by businesses |
| Customer | Buyers using Stripe |
| Product | Items or services for sale |
| Price | Cost details for products or plans |
| Event | Significant activities in a Stripe account |
| Invoice | Payment request document |
| BalanceTransaction | Funds movement record in Stripe |
Please note that endpoints in the verified source can be customized as per the Stripe API reference documentation.
The source is compatible with stripe-python versions 5.x through 15.x and requests Stripe API version 2022-11-15. Pinning the API version keeps the loaded table schemas stable, but fields introduced in later Stripe API versions (for example, the discounts array that replaced the singular discount field) will not appear in the loaded data.
Setup guide
Grab credentials
- Log in to your Stripe account.
- Click ⚙️ Settings in the top-right.
- Go to Developers from the top menu.
- Choose "API Keys".
- In "Standard Keys", click "Reveal test key" beside the Secret Key.
- Note down the API_secret_key for configuring secrets.toml.
Note: The Stripe UI, which is described here, might change. The full guide is available at this link.
Initialize the verified source
To get started with your data pipeline, follow these steps:
-
Enter the following command:
dlt init stripe_analytics duckdbThis command will initialize the pipeline example with Stripe as the source and duckdb as the destination.
-
If you'd like to use a different destination, simply replace
duckdbwith the name of your preferred destination. -
After running this command, a new directory will be created with the necessary files and configuration settings to get started.
Add credentials
-
In the
.dltfolder, there's a file calledsecrets.toml. It's where you store sensitive information securely, like access tokens. Keep this file safe. Here's its format for service account authentication:# put your secret values and credentials here. do not share this file and do not push it to github
[sources.stripe_analytics]
stripe_secret_key = "stripe_secret_key"# please set me up! -
Substitute "stripe_secret_key" with the value you copied above for secure access to your Stripe resources.
-
Finally, enter credentials for your chosen destination as per the docs.
For more information, read the General Usage: Credentials.
Run the pipeline
-
Before running the pipeline, ensure that you have installed all the necessary dependencies by running the command:
pip install -r requirements.txt -
You're now ready to run the pipeline! To get started, run the following command:
python stripe_analytics_pipeline.py -
Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command:
dlt pipeline <pipeline_name> showFor example, the
pipeline_namefor the above pipeline example isstripe_analytics. You may also use any custom name instead.
For more information, read the guide on how to run a pipeline.
Sources and resources
dlt works on the principle of sources and resources.
Default endpoints
You can write your own pipelines to load data to a destination using this verified source. However, it is important to note how the ENDPOINTS and INCREMENTAL_ENDPOINTS tuples are defined in stripe_analytics/settings.py.
# The most popular Stripe API's endpoints
ENDPOINTS = ("Subscription", "Account", "Coupon", "Customer", "Invoice", "Product", "Price")
# Possible incremental endpoints
# The incremental endpoints default to Stripe API endpoints with uneditable data.
INCREMENTAL_ENDPOINTS = ("Event", "BalanceTransaction")
Stripe's default API endpoints miss the "updated" key, triggering 'replace' mode. Use incremental endpoints for incremental loading.
Source stripe_source
This function retrieves data from the Stripe API for the specified endpoint:
@dlt.source
def stripe_source(
endpoints: Tuple[str, ...] = ENDPOINTS,
stripe_secret_key: str = dlt.secrets.value,
start_date: Optional[DateTime] = None,
end_date: Optional[DateTime] = None,
) -> Iterable[DltResource]:
...
endpoints: Tuple containing endpoint names.start_date: Start datetime for data loading (default: None).end_date: End datetime for data loading (default: None).
Datetime arguments accept ISO 8601 strings, datetime objects, or Unix timestamps.
This source loads all provided endpoints in 'replace' mode. For incremental endpoints, use incremental_stripe_source.
Source incremental_stripe_source
This source loads data in 'append' mode from incremental endpoints.
@dlt.source
def incremental_stripe_source(
endpoints: Tuple[str, ...] = INCREMENTAL_ENDPOINTS,
stripe_secret_key: str = dlt.secrets.value,
initial_start_date: Optional[DateTime] = None,
end_date: Optional[DateTime] = None,
) -> Iterable[DltResource]:
...
endpoints: Tuple containing incremental endpoint names.
initial_start_date: Parameter for incremental loading; data after the initial_start_date is loaded on the first run (default: None).
end_date: End datetime for data loading (default: None).
The source tracks the created timestamp of loaded records. Subsequent runs then retrieve only newly created data using append mode, streamlining the process and preventing redundant data downloads.
Stripe retains Event objects for only 30 days. Setting initial_start_date further back will not return older events; use the BalanceTransaction endpoint for historical, immutable data.
For more information, read the Incremental loading.
Customization
Create your own pipeline
If you wish to create your own pipelines, you can leverage source and resource methods from this verified source.
-
Configure the pipeline by specifying the pipeline name, destination, and dataset as follows:
pipeline = dlt.pipeline(
pipeline_name="stripe_pipeline", # Use a custom name if desired
destination="duckdb", # Choose the appropriate destination (e.g., duckdb, redshift, post)
dataset_name="stripe_dataset" # Use a custom name if desired
) -
To load endpoints like "Plan" and "Charge" in replace mode, retrieve all data for the year 2022:
source_single = stripe_source(
endpoints=("Plan", "Charge"),
start_date=pendulum.datetime(2022, 1, 1),
end_date=pendulum.datetime(2022, 12, 31),
)
load_info = pipeline.run(source_single)
print(load_info) -
To load data from the "BalanceTransaction" endpoint, whose records are immutable, using incremental loading:
# Load all data on the first run that was created after initial_start_date and before end_date
source_incremental = incremental_stripe_source(
endpoints=("BalanceTransaction", ),
initial_start_date=pendulum.datetime(2022, 1, 1),
end_date=pendulum.datetime(2022, 12, 31),
)
load_info = pipeline.run(source_incremental)
print(load_info)For subsequent runs, the source remembers the
createdtimestamp of the last loaded record and retrieves only newer records, in append mode. -
To load data created after December 31, 2022, adjust the data range for stripe_source to prevent redundant loading. For
incremental_stripe_source, the last loadedcreatedtimestamp from the previous run is used automatically.source_single = stripe_source(
endpoints=("Plan", "Charge"),
start_date=pendulum.datetime(2022, 12, 31),
)
source_incremental = incremental_stripe_source(
endpoints=("BalanceTransaction", ),
)
load_info = pipeline.run(data=[source_single, source_incremental])
print(load_info)To load data, maintain the pipeline name and destination dataset name. The pipeline name is vital for accessing the last run's state, which determines the incremental data load's end date. Altering these names can trigger a “dev_mode”, disrupting the metadata (state) tracking for incremental data loading.
Additional Setup guides
- Load data from Stripe to Azure Cloud Storage in python with dlt
- Load data from Stripe to Redshift in python with dlt
- Load data from Stripe to Supabase in python with dlt
- Load data from Stripe to BigQuery in python with dlt
- Load data from Stripe to CockroachDB in python with dlt
- Load data from Stripe to Microsoft SQL Server in python with dlt
- Load data from Stripe to DuckDB in python with dlt
- Load data from Stripe to ClickHouse in python with dlt
- Load data from Stripe to AlloyDB in python with dlt
- Load data from Stripe to Timescale in python with dlt