dltHub
Blog /

Syncing Google Forms data with Notion using dlt

  • Aman Gupta,
    Junior Data Engineer

Why do we do it?

Hello, I'm Aman, and I assist the dlthub team with various data-related tasks. In a recent project, the Operations team needed to gather information through Google Forms and integrate it into a Notion database. Initially, they tried using the Zapier connector as a quick and cost-effective solution, but it didn’t work as expected. Since we’re at dlthub, where everyone is empowered to create pipelines, I stepped in to develop one that would automate this process.

The solution involved setting up a workflow to automatically sync data from Google Forms to a Notion database. This was achieved using Google Sheets, Google Apps Script, and a dlt pipeline, ensuring that every new form submission was seamlessly transferred to the Notion database without the need for manual intervention.

Implementation

So here are a few steps followed:

Link the Google Form to a Google Sheet to save responses in the sheet. Follow Google's documentation for setup.

Step 2: Google Apps Script for Data Transfer

Create a Google Apps Script to send data from Google Sheets to a Notion database via a webhook. This script triggers every time a form response is saved.

Google Apps Script code:

function sendWebhookOnEdit(e) {
  var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  var range = sheet.getActiveRange();
  var updatedRow = range.getRow();
  var lastColumn = sheet.getLastColumn();
  var headers = sheet.getRange(1, 1, 1, lastColumn).getValues()[0];
  var updatedFields = {};
  var rowValues = sheet.getRange(updatedRow, 1, 1, lastColumn).getValues()[0];

  for (var i = 0; i < headers.length; i++) {
    updatedFields[headers[i]] = rowValues[i];
  }

  var jsonPayload = JSON.stringify(updatedFields);
  Logger.log('JSON Payload: ' + jsonPayload);

  var url = 'https://your-webhook.cloudfunctions.net/to_notion_from_google_forms'; // Replace with your Cloud Function URL
  var options = {
    'method': 'post',
    'contentType': 'application/json',
    'payload': jsonPayload
  };

  try {
    var response = UrlFetchApp.fetch(url, options);
    Logger.log('Response: ' + response.getContentText());
  } catch (error) {
    Logger.log('Failed to send webhook: ' + error.toString());
  }
}

Step 3: Deploying the ETL Pipeline

Deploy a dlt pipeline to Google Cloud Functions to handle data transfer from Google Sheets to the Notion database. The pipeline is triggered by the Google Apps Script.

  • Create a Google Cloud function.
  • Create main.py with the Python code below.
  • Ensure requirements.txt includes dlt.
  • Deploy the pipeline to Google Cloud Functions.
  • Use the function URL in the Google Apps Script.
This pipeline uses @dlt.destination decorator which is used to set up custom destinations. Using custom destinations is a part of dlt's reverse ETL capabilities. To read more about dlt's reverse ETL pipelines, read the documentation here.

Python code for main.py (Google cloud functions) :

import dlt
from dlt.common import json
from dlt.common.typing import TDataItems
from dlt.common.schema import TTableSchema
from datetime import datetime
from dlt.sources.helpers import requests

@dlt.destination(name="notion", batch_size=1, naming_convention="direct", skip_dlt_columns_and_tables=True)
def insert_into_notion(items: TDataItems, table: TTableSchema) -> None:
    api_key = dlt.secrets.value  # Add your notion API key to "secrets.toml"
    database_id = "your_notion_database_id"  # Replace with your Notion Database ID
    url = "https://api.notion.com/v1/pages"
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "Notion-Version": "2022-02-22"
    }

    for item in items:
        if isinstance(item.get('Timestamp'), datetime):
            item['Timestamp'] = item['Timestamp'].isoformat()
        data = {
            "parent": {"database_id": database_id},
            "properties": {
                "Timestamp": {
                    "title": [{
                        "text": {"content": item.get('Timestamp')}
                    }]
                },
                # Add other properties here
            }
        }
        response = requests.post(url, headers=headers, data=json.dumps(data))
        print(response.status_code, response.text)

def your_webhook(request):
    data = request.get_json()
    Event = [data]

    pipeline = dlt.pipeline(
        pipeline_name='platform_to_notion',
        destination=insert_into_notion,
        dataset_name='webhooks',
        full_refresh=True
    )

    pipeline.run(Event, table_name='webhook')
    return 'Event received and processed successfully.'

Step 4: Automation and Real-Time updates

With everything set up, the workflow automates data transfer as follows:

  • Form submission saves data in Google Sheets.
  • Google Apps Script sends a POST request to the Cloud Function.
  • The dlt pipeline processes the data and updates the Notion database.

Conclusion

We initially considered using Zapier for this small task, but ultimately, handling it ourselves proved to be quite effective. Since we already use an orchestrator for our other automations, the only expense was the time I spent writing and testing the code. This experience demonstrates that dlt is a straightforward and flexible tool, suitable for a variety of scenarios. Essentially, wherever Python can be used, dlt can be applied effectively for data loading, provided it meets your specific needs.