Improve retrieval by finetuning embeddings
Finetune embeddings on custom synthetic data to improve retrieval performance.
Last updated
Finetune embeddings on custom synthetic data to improve retrieval performance.
Last updated
We previously learned how to use RAG with ZenML to build a production-ready RAG pipeline. In this section, we will explore how to optimize and maintain your embedding models through synthetic data generation and human feedback. So far, we've been using off-the-shelf embeddings, which provide a good baseline and decent performance on standard tasks. However, you can often significantly improve performance by finetuning embeddings on your own domain-specific data.
Our RAG pipeline uses a retrieval-based approach, where it first retrieves the most relevant documents from our vector database, and then uses a language model to generate a response based on those documents. By finetuning our embeddings on a dataset of technical documentation similar to our target domain, we can improve the retrieval step and overall performance of the RAG pipeline.
The work of finetuning embeddings based on synthetic data and human feedback is a multi-step process. We'll go through the following steps:
Besides ZenML, we will do this by using two open source libraries: argilla
and distilabel
. Both of these libraries focus optimizing model outputs through improving data quality, however, each one of them takes a different approach to tackle the same problem. distilabel
provides a scalable and reliable approach to distilling knowledge from LLMs by generating synthetic data or providing AI feedback with LLMs as judges. argilla
enables AI engineers and domain experts to collaborate on data projects by allowing them to organize and explore data through within an interactive and engaging UI. Both libraries can be used individually but they work better together. We'll showcase their use via ZenML pipelines.
To follow along with the example explained in this guide, please follow the instructions in the llm-complete-guide
repository where the full code is also available. This specific section on embeddings finetuning can be run locally or using cloud compute as you prefer.