2022 was a monumental year in GenerativeAI space and 2023 is seeing no slow down. This has especially been true for text generation with Large Language Models (LLMs). While LLMs have been in making for a while, remember the open-sourced GPT-2?, OpenAI changed the game by making LLM accessible. They brought them within reach for many companies by providing the simple API layer and made it mainstream with their ChatGPT interface. With this accessibility came the question: how to effectively apply LLM in enterprise use cases.
One of the powerful aspects of LLM is that it can produce human-like text. Many companies are looking to use their output for use cases like content generation, chatbots, support email response etc. At the same time the output of a LLM highly depends on the quality of the prompt it receives. To help with this, the word “Prompt Engineering” has gained a lot of traction in recent times. Prompt engineering aims to improve the quality of the prompt by including contextual information, example responses, tailoring tone and providing structure for the expected response. While prompt engineering touches many areas, providing contextual information is one of the more critical and dynamic parts of the process. This is where VectorDB can be effective.
As the name suggests, VectorDB stores the unstructured data like text, images etc. as embeddings. For a crash course in embedding, see this helpful video lesson. VectorDB makes it easier to perform semantic searches at large scale using these embeddings. There are many VectorDB solutions in the market. Some are vendor solutions like Pinecone, ES8+ (purists may claim it as not a VectorDB, but it provides many solid features) etc. Others are built on open-source tech such as Weaviate, Milvus, Qdrant etc. Selecting the right VectorDB for your company is a topic on its own. Assuming you have chosen and operationalized one, the following steps can then be applied to combine the power of Vector DB + LLM when solving for use cases like Chatbots, responding to support queries or anywhere that requires finding contextual information
Build and Store the Embeddings
This step requires creating embeddings of any information that can assist in building the context in the prompt to LLM. For example, things like support playbooks, public documents, past customer tickets etc. all can be used in answering an incoming support query. Thanks to the Open Source community and HuggingFace finding a model to build embedding is within reach. For example, see SentenceTransformers for text and image embedding. Save these embeddings in your choice of VectorDB. OpenAI has also made embedding available through their APIs if you don’t have infrastructure built to host your own.
Search Relevant Context
In this step, create embedding of the incoming query and search the vectorDB for relevant content.
Build prompt, query LLM and present answer
Using the contextual information found in the above step, build the LLM prompt and query LLM. Lastly, structure and present the answer. To make the loop continuous, you can also save the answer embedding back in VectorDB, especially if you get human reinforcement that it was a helpful answer. This way next time, your answer can even be more
Goes without saying, operationalizing and testing each of these steps will require some engineering effort. On the other hand, it can reduce the need for fine tuning a LLM while increasing its output effectiveness, and potentially constraining the hallucination. GenerativeAI when applied right can help companies be more efficient. VectorDB combined with Generative models takes this many steps further.