Business

Retrieval Augmented Generation (RAG) — An Introduction

The model hallucinated! It was giving me OK answers and then it just started hallucinating. We’ve all heard or experienced it.

Natural Language Generation models can sometimes hallucinate, i.e., they start generating text that is not quite accurate for the prompt provided. In layman’s terms, they start making stuff up that’s not strictly related to the context given or plainly inaccurate. Some hallucinations can be understandable, for example, mentioning something related but not exactly the topic in question, other times it may look like legitimate information but it’s simply not correct, it’s made up.

This is clearly a problem when we start using generative models to complete tasks and we intend to consume the information they generated to make decisions.

The problem is not necessarily tied to how the model is generating the text, but in the information it’s using to generate a response. Once you train an LLM, the information encoded in the training data is crystalized, it becomes a static representation of everything the model knows up until that point in time. In order to make the model update its world view or its knowledge base, it needs to be retrained. However, training Large Language Models requires time and money.

One of the main motivations for developing RAG s the increasing demand for factually accurate, contextually relevant, and up-to-date generated content.[1]

When thinking about a way to make generative models aware of the wealth of new information that is created everyday, researchers started exploring efficient ways to keep these models-up-to-date that didn’t require continuously re-training models.

They came up with the idea for Hybrid Models, meaning, generative models that have a way of fetching external information that can complement the data the LLM already knows and was trained on. These modela have a information retrieval component that allows the model to access up-to-date data, and the generative capabilities they are already well known for. The goal being to ensure both fluency and factual correctness when producing text.

This hybrid model architecture is called Retrieval Augmented Generation, or RAG for short.

The RAG era

Given the critical need to keep models updated in a time and cost effective way, RAG has become an increasingly popular architecture.

Its retrieval mechanism pulls information from external sources that are not encoded in the LLM. For example, you can see RAG in action, in the real world, when you ask Gemini something about the Brooklyn Bridge. At the bottom you’ll see the external sources where it pulled information from.

Example of external sources being shown as part of the output of the RAG model. (Image by author)

By grounding the final output on information obtained from the retrieval module, the outcome of these Generative AI applications, is less likely to propagate any biases originating from the outdated, point-in-time view of the training data they used.

The second piece of the Rag Architecture is what is the most visible to us, consumers, the generation model. This is typically an LLM that processes the information retrieved and generates human-like text.

RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs[1]

As for its internal architecture, the retrieval module, relies on dense vectors to identify the relevant documents to use, while the generative model, utilizes the typical LLM architecture based on transformers.

A basic flow of the RAG system along with its component. Image and caption taken from paper referenced in [1] (Image by Author)

This architecture addresses very important pain-points of generative models, but it’s not a silver bullet. It also comes with some challenges and limitations.

The Retrieval module may struggle in getting the most up-to-date documents.

This part of the architecture relies heavily on Dense Passage Retrieval (DPR)[2, 3]. Compared to other techniques such as BM25, which is based on TF-IDF, DPR does a much better job at finding the semantic similarity between query and documents. It leverages semantic meaning, instead of simple keyword matching is especially useful in open-domain applications, i.e., think about tools like Gemini or ChatGPT, which are not necessarily experts in a particular domain, but know a little bit about everything.

However, DPR has its shortcomings too. The dense vector representation can lead to irrelevant or off-topic documents being retrieved. DPR models seem to retrieve information based on knowledge that already exists within their parameters, i.e, facts must be already encoded in order to be accessible by retrieval[2].

[…] if we extend our definition of retrieval to also encompass the ability to navigate and elucidate concepts previously unknown or unencountered by the model—a capacity akin to how humans research and retrieve information—our findings imply that DPR models fall short of this mark.[2]

To mitigate these challenges, researchers thought about adding more sophisticated query expansion and contextual disambiguation.  Query expansion is a set of techniques that modify the original user query by adding relevant terms, with the goal of establishing a connection between the intent of the user’s query with relevant documents[4].

There are also cases when the generative module fails to fully take into account, into its responses, the information gathered in the retrieval phase. To address this, there have been new improvements on attention and hierarchical fusion techniques [5].

Model performance is an important metric, especially when the goal of these applications is to seamlessly be part of our day-to-day lives, and make the most mundane tasks almost effortless. However, running RAG end-to-end can be computationally expensive. For every query the user makes, there needs to be one step for information retrieval, and another for text generation. This is where new techniques, such as Model Pruning [6] and Knowledge Distillation [7] come into play, to ensure that even with the additional step of searching for up-to-date information outside of the trained model data, the overall system is still performant.

Lastly, while the information retrieval module in the RAG architecture is intended to mitigate bias by accessing external sources that are more up-to-date than the data the model was trained on, it may actually not fully eliminate bias. If the external sources are not meticulously chosen, they can continue to add bias or even amplify existing biases from the training data.

Conclusion

Utilizing RAG in generative applications provides a significant improvement on the model’s capacity to stay up-to-date, and gives its users more accurate results.

When used in domain-specific applications, its potential is even clearer. With a narrower scope and an external library of documents pertaining only to a particular domain, these models have the ability to do a more effective retrieval of new information.

However, ensuring generative models are constantly up-to-date is far from a solved problem.

Technical challenges, such as, handling unstructured data or ensuring model performance, continue to be active research topics.

Hope you enjoyed learning a bit more about RAG, and the role this type of architecture plays in making generative applications stay up-to-date without requiring to retrain the model.

Thanks for reading!


References

  1. A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions. (2024). Shailja Gupta and Rajesh Ranjan and Surya Narayan Singh. (ArXiv)
  2. Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving. (2024). Benjamin Reichman and Larry Heck— (link)
  3. Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D. & Yih, W. T. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769-6781).(Arxiv)
  4. Hamin Koo and Minseon Kim and Sung Ju Hwang. (2024).Optimizing Query Generation for Enhanced Document Retrieval in RAG. (Arxiv)
  5. Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 874-880). (Arxiv)
  6. Han, S., Pool, J., Tran, J., & Dally, W. J. (2015). Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems (pp. 1135-1143). (Arxiv)
  7. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. ArXiv. /abs/1910.01108 (Arxiv)

The post Retrieval Augmented Generation (RAG) — An Introduction appeared first on Towards Data Science.

Picture of John Doe
John Doe

Sociosqu conubia dis malesuada volutpat feugiat urna tortor vehicula adipiscing cubilia. Pede montes cras porttitor habitasse mollis nostra malesuada volutpat letius.

Related Article

Leave a Reply

Your email address will not be published. Required fields are marked *

We would love to hear from you!

Please record your message.

Record, Listen, Send

Allow access to your microphone

Click "Allow" in the permission dialog. It usually appears under the address bar in the upper left side of the window. We respect your privacy.

Microphone access error

It seems your microphone is disabled in the browser settings. Please go to your browser settings and enable access to your microphone.

Speak now

00:00

Canvas not available.

Reset recording

Are you sure you want to start a new recording? Your current recording will be deleted.

Oops, something went wrong

Error occurred during uploading your audio. Please click the Retry button to try again.

Send your recording

Thank you

Meet Eve: Your AI Training Assistant

Welcome to Enlightening Methodology! We are excited to introduce Eve, our innovative AI-powered assistant designed specifically for our organization. Eve represents a glimpse into the future of artificial intelligence, continuously learning and growing to enhance the user experience across both healthcare and business sectors.

In Healthcare

In the healthcare category, Eve serves as a valuable resource for our clients. She is capable of answering questions about our business and providing "Day in the Life" training scenario examples that illustrate real-world applications of the training methodologies we employ. Eve offers insights into our unique compliance tool, detailing its capabilities and how it enhances operational efficiency while ensuring adherence to all regulatory statues and full HIPAA compliance. Furthermore, Eve can provide clients with compelling reasons why Enlightening Methodology should be their company of choice for Electronic Health Record (EHR) implementations and AI support. While Eve is purposefully designed for our in-house needs and is just a small example of what AI can offer, her continuous growth highlights the vast potential of AI in transforming healthcare practices.

In Business

In the business section, Eve showcases our extensive offerings, including our cutting-edge compliance tool. She provides examples of its functionality, helping organizations understand how it can streamline compliance processes and improve overall efficiency. Eve also explores our cybersecurity solutions powered by AI, demonstrating how these technologies can protect organizations from potential threats while ensuring data integrity and security. While Eve is tailored for internal purposes, she represents only a fraction of the incredible capabilities that AI can provide. With Eve, you gain access to an intelligent assistant that enhances training, compliance, and operational capabilities, making the journey towards AI implementation more accessible. At Enlightening Methodology, we are committed to innovation and continuous improvement. Join us on this exciting journey as we leverage Eve's abilities to drive progress in both healthcare and business, paving the way for a smarter and more efficient future. With Eve by your side, you're not just engaging with AI; you're witnessing the growth potential of technology that is reshaping training, compliance and our world! Welcome to Enlightening Methodology, where innovation meets opportunity!