Large Language Models (LLMs) stand as powerful tools capable of generating responses that are not only accurate but also tailored to specific contexts. Their ability to understand and generate human-like text has revolutionized various applications, from chatbots to language translation services. However, as with any advanced technology, they aren't without their hurdles.
One of the primary challenges faced by LLMs lies in their ability to provide precise and context-specific answers. This difficulty arises from constraints placed on the number of tokens these models can process, ultimately limiting the amount of context they can take into account when generating responses. Despite these challenges, there's a glimmer of hope on the horizon in the form of Retrieval-Augmented Generation (RAG). By integrating retrieval mechanisms into the generation process, RAG enables LLMs to access a broader range of context, enhancing the accuracy and relevance of their responses.
RAG employs a vector database to store a repository of documents and leverages a retriever to query these documents. The retriever selects the most relevant information from the document store, which is then passed as input or context to the LLM. However, despite its potential, the effectiveness of RAG relies heavily on the underlying indexing mechanism and retriever. This becomes increasingly challenging when dealing with documents of diverse structures such as tables, PDFs, XMLs, etc. Several factors such as the similarity measure, the quality of data in the underlying document store, the embeddings used in the vector store, etc. determine the quality of the retriever and consequently, the overall RAG application.
This article explores the process of developing an RAG system using Databricks. Databricks hosts a variety of tools to support the development of RAG applications on both structured and unstructured data such as PDFs, website contents, word documents, and so on.
Let's take a closer look at the RAG architecture and understand its inner workings. An RAG application typically consists of the below four stages:
Databricks Vector Search, which is now a part of the Databricks Data Intelligence Platform, makes it easy to create vector indices for your proprietary data stored in the Data Lakehouse. The Delta tables can be used to store data chunks and embeddings, and the Vector Search creates a queryable vector database, storing embedding vectors that can be set up to sync automatically with your knowledge base.
Databricks also offers model serving capabilities for deploying Large Language Models (LLMs) and hosting RAG chains. This includes the configuration of dedicated endpoints for accessing state-of-the-art open LLMs through Foundation Model APIs, as well as integration with third-party models. The platform leverages MLflow to track the development of RAG chains and evaluate the performance of LLMs.
In RAG scenarios for structured data, feature engineering and serving are employed. Additionally, online tables can be served as a low-latency API to incorporate data into RAG applications.
Databricks has also introduced the AI Playground as a chat-based user interface, facilitating the testing and comparison of Language Model Models.
The following steps illustrate the process of preparing and indexing proprietary data in Databricks as part of the RAG workflow.
The Vector Search functionality thus indexes the embeddings and metadata, storing them in a vector database for querying by the RAG chain. It automatically computes embeddings for any new data in the source Delta table and updates the vector search index accordingly.
After the index is ready, the RAG chain is ready to handle incoming queries, and the following sequence of actions are taken to process a user query.
Overall, Databricks emerges as a robust platform, offering a suite of tools to streamline the development of RAG applications. Databricks Vector Search offers high performance, security, and user-friendliness. It utilizes Unity Catalog-based security and data governance tools, which streamlines policies for organizational data. Additionally, the release of Lakehouse Monitoring adds another layer of security and oversight, enabling organizations to monitor RAG applications and proactively prevent the generation of harmful content.