From Inbox to Insights: AI-enhanced email analysis with dlt and Kestra
- Anuun Chinbat,
Working Student
THE PROBLEM
There are two types of people: those who hoard thousands of unread emails in their inbox and those who open them immediately to avoid the ominous red notification. But one thing unites us all: everyone hates emails. The reasons are clear:
- They're often unnecessarily wordy, making them time-consuming.
- SPAM (obviously).
- They become black holes of lost communication because CC/BCC-ing people doesn't always work.
- Sometimes, there are just too many.
So, this post will explore a possible remedy to the whole email issue involving AI.
THE SOLUTION
Don't worry; it's nothing overly complex, but it does involve some cool tools that everyone could benefit from.
đĄ In a nutshell, I created two flows (a main flow and a subflow) in Kestra :
- The main flow extracts email data from Gmail and loads it into BigQuery using
dlt
, checks for new emails, and, if found, triggers the subflow for further processing. - The subflow utilizes OpenAI to summarize and analyze the sentiment of an email, loads the results into BigQuery, and then notifies about the details via Slack.
Just so you're aware:
Kestra
is an open-source automation tool that makes both scheduled and event-driven workflows easy.dlt
is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.
Wanna jump to the GitHub repo?
HOW IT WORKS
To lay it all out clearly: Everything's automated in Kestra
, with hassle-free data loading thanks to dlt
, and the analytical thinking handled by OpenAI. Here's a diagram to help you understand the general outline of the entire process.
Now, let's delve into specific parts of the implementation.
The environment:
đĄ The two flows in Kestra are set up in a very straightforward and intuitive manner. Simply follow the Prerequisites and Setup guidelines in the repo. It should take no more than 15 minutes.
Once youâve opened http://localhost:8080/ in your browser, this is what youâll see on your screen:
Now, all you need to do is create your flows and execute them.
The great thing about Kestra
is its ease of use - it's UI-based, declarative, and language-agnostic. Unless you're using a task like a Python script, you don't even need to know how to code.
Kestra
for your projects, consult their documentation and the plugin pages for further insights. The data loading part
đĄ This is entirely managed by dlt
in just five lines of code.
I set up a pipeline using the Inbox source â a regularly tested and verified source from dlt
â with BigQuery as the destination.
In my scenario, the email data doesn't have nested structures, so there's no need for flattening. However, if you encounter nested structures in a different use case, dlt
can automatically normalize them during loading.
Here's how the pipeline is defined and subsequently run in the first task of the main flow in Kestra
:
# Run dlt pipeline to load email data from gmail to BigQuery
pipeline = dlt.pipeline(
pipeline_name="standard_inbox",
destination='bigquery',
dataset_name="messages_data",
dev_mode=False,
)
# Set table name
table_name = "my_inbox"
# Get messages resource from the source
messages = inbox_source(start_date = pendulum.datetime(2023, 11, 15)).messages
# Configure the messages resource to get bodies of the emails
messages = messages(include_body=True).with_name(table_name)
# Load data to the "my_inbox" table
load_info = pipeline.run(messages)
In this setup âď¸, dlt
loads all email data into the table âmy_inboxâ, with the email body specifically stored in the âbodyâ column. After executing your flow in Kestra
, the table in BigQuery should appear as shown below:
dlt
 here. The AI part
đĄ In this day and age, how can we not incorporate AI into everything? đ
But seriously, if you're familiar with OpenAI, it's a matter of an API call to the chat completion endpoint. What simplifies it even further is Kestra
âs OpenAI plugin.
In my subflow, I used it to obtain both the summary and sentiment analysis of each email body. Here's a glimpse of how it's implemented:
- id: get_summary
type: io.kestra.plugin.openai.ChatCompletion
apiKey: "{{ secret('OPENAI_API') }}"
model: gpt-3.5-turbo
prompt: "Summarize the email content in one sentence with less than 30 words: {{inputs.data[0]['body']}}"
messages: [{"role": "system", "content": "You are a tool that summarizes emails."}]
Kestra
also includes Slack, as well as BigQuery plugins, which I used in my flows. Additionally, there is a wide variety of other plugins available. The automation part
đĄ Kestra
triggers are the ideal solution!
Iâve used a schedule trigger that allows you to execute your flow on a regular cadence e.g. using a CRON expression:
triggers:
- id: schedule
type: io.kestra.core.models.triggers.types.Schedule
cron: "0 9-18 * * 1-5"
This configuration ensures that your flows are executed hourly on workdays from 9 AM to 6 PM.
THE OUTCOME
A Slack assistant that delivers crisp inbox insights right at your fingertips:
And a well-organized table in BigQuery, ready for you to dive into a more complex analysis:
In essence, using Kestra
and dlt
offers a trio of advantages for refining email analysis and data workflows:
- Efficient automation:
Kestra
effortlessly orchestrates intricate workflows, integrating smoothly with tools likedlt
, OpenAI, and BigQuery. This process reduces manual intervention while eliminating errors, and freeing up more time for you. - User-friendly and versatile: Both
Kestra
anddlt
are crafted for ease of use, accommodating a range of skill levels. Their adaptability extends to various use cases. - Seamless scaling:
Kestra
, powered by Kafka and Elasticsearch, adeptly manages large-scale data and complex workflows. Coupled withdlt
's solid data integration capabilities, it ensures a stable and reliable solution for diverse requirements.
HOW IT COULD WORK ELSEWHERE
Basically, you can apply the architecture discussed in this post whenever you need to automate a business process!
For detailed examples of how Kestra
can be utilized in various business environments, you can explore Kestra's use cases.
Embrace automation, where the only limit is your imagination! đ