RAG with ZenML

RAG is a sensible way to get started with LLMs.

Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of retrieval-based and generation-based models. In this guide, we'll explore how to set up RAG pipelines with ZenML, including data ingestion, index store management, and tracking RAG-associated artifacts.

LLMs are a powerful tool, as they can generate human-like responses to a wide variety of prompts. However, they can also be prone to generating incorrect or inappropriate responses, especially when the input prompt is ambiguous or misleading. They are also (currently) limited in the amount of text they can understand and/or generate. While there are some LLMs like Google's Gemini 1.5 Pro that can consistently handle 1 million tokens (small units of text), the vast majority (particularly the open-source ones currently available) handle far less.

The first part of this guide to RAG pipelines with ZenML is about understanding the basic components and how they work together. We'll cover the following topics:

  • why RAG exists and what problem it solves

  • how to ingest and preprocess data that we'll use in our RAG pipeline

  • how to leverage embeddings to represent our data; this will be the basis for our retrieval mechanism

  • how to store these embeddings in a vector database

  • how to track RAG-associated artifacts with ZenML

At the end, we'll bring it all together and show all the components working together to perform basic RAG inference.

Last updated