dltHub
Blog /

RAG playground: Build your own RAG bot

  • Adrian Brudaru,
    Co-Founder & CDO

Workshop overview

We recently conducted a workshop on Retrieval-Augmented Generation (RAG) creation at Data Talks Club - LLM Zoomcamp. In this workshop look into the process of loading data and creating your own RAG system. We first load data and embeddings from a Notion page into LanceDB and develop a RAG Bot using Ollama. Finally we interact with the bot by asking it questions. Below, you'll find a summary of the resources, tools, and examples we discussed during the session.

Key resources

  • dlt: Data loading and transformation.
  • LanceDB: An efficient vector database.
  • Ollama: Local LLMs for Retrieval-Augmented Generation.
  • Data Talks Club (DTC): A vibrant community for data engineering resources.

Workshop content

In this workshop, we explored the fundamentals of creating a Retrieval-Augmented Generation (RAG) system. You can follow along with the detailed workshop video or access the Google Colab notebook for hands-on experience.

1. Introduction to dlt and LanceDB:

  • Loading data into LanceDB:
    • Install the necessary packages: dlt[lancedb] and sentence-transformers.
    • Load course Q&A data into LanceDB without embeddings.
    • Create and execute a dlt pipeline to load data into LanceDB.

2. Embedding data in LanceDB:

  • Set up the embedding model using environment variables.
  • Load and embed data into a new LanceDB table using lancedb_adapter.

3. Creating a Notion to LanceDB pipeline:

  • Install requirements:
    • Install dlt[lancedb] and sentence-transformers.
  • Create a dlt project:
    • Run the command dlt init rest_api lancedb to set up the project.
    • Read more about the REST API verified source here.
  • Add API credentials:
    • Obtain your Notion API key and store it in environment variables or secrets.toml.
  • Write the pipeline code:
    • Configure the dlt REST API source to connect to the Notion API.
    • Extract relevant content from the Notion API responses.
    • Load data incrementally to ensure only new or changed data is added.

4. Running the pipeline:

  • Define and run the pipeline to load and embed data from Notion into LanceDB using lancedb_adapter.

5. Creating a RAG Bot with Ollama:

  • Setup:
    • Install and start Ollama.
    • Download the desired LLM model (e.g., llama2-uncensored).
  • Write functions:
    • Retrieve relevant content from LanceDB based on user queries.
    • Create a simple RAG bot with Ollama to provide context-aware answers.

Example Questions for the RAG Bot:

  • How many vacation days do I get?
  • Can I get maternity leave?

To go through these steps in detail please follow the Google Collab notebook here.

If you have any questions, join our community on Slack or reach out during our next workshop session!

DTC Learners showcase

Check out the incredible projects from our DTC learners:

Do you want to participate in our future workshops?

Sign up for our newsletter or keep an eye on our events page for workshop announcements.