Version: 1.15.0 (latest)

Personio

Need help deploying these sources or figuring out how to run them in your data stack?

Join our Slack community or Get in touch with the dltHub Customer Success team.

Personio is a human resources management software that helps businesses streamline HR processes, including recruitment, employee data management, and payroll, in one platform.

Our Personio verified source loads data using the Personio API to your preferred destination.

tip

You can check out our pipeline example here.

Resources that can be loaded using this verified source are:

Name	Description	Endpoint
employees	Retrieves company employees' details	/company/employees
absences	Retrieves absence periods for absences tracked in days	/company/time-offs
absences_types	Retrieves a list of various types of employee absences	/company/time-off-types
attendances	Retrieves attendance records for each employee	/company/attendances
projects	Retrieves a list of all company projects	/company/attendances/projects
document_categories	Retrieves all document categories of the company	/company/document-categories
employees_absences_balance	The transformer retrieves the absence balance for a specific employee	/company/employees/{employee_id}/absences/balance
custom_reports_list	Retrieves metadata about existing custom reports (name, report type, report date)	/company/custom-reports/reports
custom_reports	The transformer for custom reports	/company/custom-reports/reports/{report_id}

Setup guide

Grab credentials

To load data from Personio, you need to obtain API credentials, client_id and client_secret:

Sign in to your Personio account, and ensure that your user account has API access rights.
Navigate to Settings > Integrations > API credentials.
Click on "Generate new credentials."
Assign necessary permissions to credentials, i.e., read access.

info

The Personio UI, which is described here, might change. The full guide is available at this link.

Initialize the verified source

To get started with your data pipeline, follow these steps:

Enter the following command:
```
dlt init personio duckdb
```
This command will initialize the pipeline example with Personio as the source and duckdb as the destination.
If you'd like to use a different destination, simply replace duckdb with the name of your preferred destination.
After running this command, a new directory will be created with the necessary files and configuration settings to get started.

For more information, read Add a verified source.

Add credentials

In the .dlt folder, there's a file called secrets.toml. It's where you store sensitive information securely, like access tokens. Keep this file safe. Here's its format for service account authentication:

# Put your secret values and credentials here
# Note: Do not share this file and do not push it to GitHub!
[sources.personio]
client_id = "papi-*****" # please set me up!
client_secret = "papi-*****" # please set me up!

Replace the value of client_id and client_secret with the one that you copied above. This will ensure that your data-verified source can access your Personio API resources securely.
Next, follow the instructions in Destinations to add credentials for your chosen destination. This will ensure that your data is properly routed to its final destination.

For more information, read Credentials.

Run the pipeline

Before running the pipeline, ensure that you have installed all the necessary dependencies by running the command:
```
pip install -r requirements.txt
```
You're now ready to run the pipeline! To get started, run the following command:
```
python personio_pipeline.py
```
Once the pipeline has finished running, you can verify that everything loaded correctly by using the following command:
```
dlt pipeline <pipeline_name> show
```
For example, the pipeline_name for the above pipeline example is personio, you may also use any custom name instead.

For more information, read Run a pipeline.

Sources and resources

dlt works on the principle of sources and resources.

Source `personio_source`

This dlt source returns data resources like employees, absences, absence_types, etc.

@dlt.source(name="personio")
def personio_source(
    client_id: str = dlt.secrets.value,
    client_secret: str = dlt.secrets.value,
    items_per_page: int = ITEMS_PER_PAGE,
) -> Iterable[DltResource]:
    ...
    return (
    employees,
    absence_types,
    absences,
    attendances,
    projects,
    document_categories,
    employees_absences_balance,
    custom_reports_list,
    custom_reports,
)

client_id: Generated ID for API access.

client_secret: Generated secret for API access.

items_per_page: Maximum number of items per page, defaults to 200.

Resource `employees`

This resource retrieves data on all the employees in a company.

@dlt.resource(primary_key="id", write_disposition="merge")
def employees(
    updated_at: dlt.sources.incremental[
        pendulum.DateTime
    ] = dlt.sources.incremental(
        "last_modified_at", initial_value=None, allow_external_schedulers=True
    ),
    items_per_page: int = ITEMS_PER_PAGE,
) -> Iterable[TDataItem]:
    ...

updated_at: The saved state of the last 'last_modified_at' value. It is used for incremental loading.

items_per_page: Maximum number of items per page, defaults to 200.

allow_external_schedulers: A boolean that, if true, permits external schedulers to manage incremental loading.

Like the employees resource discussed above, other resources absences and attendances load data incrementally from the Personio API to your preferred destination.

Resource `absence_types`

Simple resource, which retrieves a list of various types of employee absences.

@dlt.resource(primary_key="id", write_disposition="replace")
def absence_types(items_per_page: int = ITEMS_PER_PAGE) -> Iterable[TDataItem]:
   ...
...

items_per_page: Maximum number of items per page, defaults to 200.

It is important to note that the data is loaded in replace mode where the existing data is completely replaced.

In addition to the mentioned resource, there are three more resources projects, custom_reports_list, and document_categories with similar behavior.

Resource-transformer `employees_absences_balance`

Besides these source and resource functions, there are two transformer functions for endpoints like /company/employees/\{employee_id\}/absences/balance and /company/custom-reports/reports/\{report_id\}. The transformer functions transform or process data from resources.

The transformer function employees_absences_balance processes data from the employees resource. It fetches and returns a list of the absence balances for each employee.

@dlt.transformer(
    data_from=employees,
    write_disposition="merge",
    primary_key=["employee_id", "id"],
)
@dlt.defer
def employees_absences_balance(employees_item: TDataItem) -> Iterable[TDataItem]:
    ...

employees_item: The data item from the 'employees' resource.

It uses the @dlt.defer decorator to enable parallel run in thread pool.

Customization

Create your own pipeline

If you wish to create your own pipelines, you can leverage source and resource methods from this verified source.

Configure the pipeline by specifying the pipeline name, destination, and dataset as follows:

pipeline = dlt.pipeline(
   pipeline_name="personio",  # Use a custom name if desired
   destination="duckdb",  # Choose the appropriate destination (e.g., duckdb, redshift, post)
   dataset_name="personio_data"  # Use a custom name if desired
)

To load employee data:

load_data = personio_source().with_resources("employees")
print(pipeline.run(load_data))

To load data from all supported endpoints:

load_data = personio_source()
print(pipeline.run(load_data))

Personio

Setup guide

Grab credentials

Initialize the verified source

Add credentials

Run the pipeline

Sources and resources

Source `personio_source`

Resource `employees`

Resource `absence_types`

Resource-transformer `employees_absences_balance`

Customization

Create your own pipeline

DHelp

Ask a question

Setup guide​

Grab credentials​

Initialize the verified source​

Add credentials​

Run the pipeline​

Sources and resources​

Source personio_source​

Resource employees​

Resource absence_types​

Resource-transformer employees_absences_balance​

Customization​

Create your own pipeline​

DHelp

Ask a question

Setup guide

Grab credentials

Initialize the verified source

Add credentials

Run the pipeline

Sources and resources

Source `personio_source`

Resource `employees`

Resource `absence_types`

Resource-transformer `employees_absences_balance`

Customization

Create your own pipeline