KubroTM > When RAG is all the RAGE - Part 2
When RAG is all the RAGE - Part 2
Posted on 25 April, 2024

While the basic RAG is technically relatively straightforward to deploy (incl. open source, as well as pre-packaged solutions), getting it right for a particular use case is a much more complex task.

It's best to consider upfront these challenges (based on our secondary research, conversations with practitioners, and our experiments):

-Similarity Issue: Similarity algorithms (question vs ‘chunks’) may need fine-tuning for specialist content cases
-LLM Logic: Synthesizing ‘chunks’ may confuse LLM – as ’chunks’ might miss the context they came from
-Dates Struggle: Temporality, how to incorporate the logic of a time sequence
-Testing Setup: It’s hard to know how well it would work without deploying the whole architecture
-Data Privacy: Easy tools require upload to external parties
-Quality Review: May require significant human resource
-Cost - Can escalate quickly with APIs / Services while needing a minimum scale for in-house deployment
-Use Case-Specific - Case-by-case deployments make it less ‘scalable’ across the organization

With new LLMs coming out fast, yesterday's constraints become standard features tomorrow, and the whole RAG concept might one day become obsolete with the super-large context windows, but that seems still a more distant possibility.

It would be nice to say that we have some sort of new magical solution to these challenges. To be clear, we do not. However, at Robotic Online Intelligence (ROI), we do have a few ideas on how to experiment with the application of LLMs to research documents, which, on a small scale and without vector databases - a "Regexy RAG" Playground on Kubro(TM):

One of the key elements in RAG is getting the 'chunks' of text that are most similar (relevant) to the text in the user's question.

This can also be accomplished (on a small scale) by first converting the user's question into a set of keywords or regular expressions and searching for these within the text, then capturing the text around the match (say 400 characters before and after the match) to generate a relevant 'chunk'.

Such a method will not give a degree of relevance (which can be better accomplished with a vector database) but is simple, cheap, and good enough to explore the use cases, with less uncertainty and good explainability.

Watch this 1-minute video with a quick example (presented by R2C2 - my early attempts at digital self-replication...)...