The RAG Road to RAGginess: Lessons from the Trenches

Posted on 23 February, 2025

Some lessons we’ve learned at Robotic Online Intelligence (ROI) from deploying Retrieval-Augmented Generation (RAG) in a deceptively simple challenge: helping investment analysts dig into 10 years of annual reports from 30 Chinese property developers listed primarily in Hong Kong. These reports - messy PDFs, far less structured than 10-Ks - were the proving ground for questions like:

“How does Longfor’s strategy differ from Vanke’s?”

“How have Country Garden’s investment priorities shifted over the past 5 years?”

“Which cities drive Agile’s development business?”

The real goal was to build a more generic toolkit for domain-specific ‘knowledge discovery’. China property, a market we’ve studied for 13+ years at Real Estate Foresight (REF), was the perfect test bed.

Here’s what we learned:

1. It Works—Sort Of: RAG delivers for specific question types, but it’s a heavy lift to deploy for a niche domain.
2. Metadata is King: Hard-coded tags (e.g., years, developer names) made the biggest difference in optimization.
3. The Accuracy Trap: Verifying answers to complex questions is difficult.

We deployed RAG from scratch, self-hosted, using open-source tools, and here are some techniques we used across the key steps:

CHUNKING

•Use OCR (old-style but with higher accuracy than current multimodal LLMs) to convert PDF files to text and decompose text into paragraphs and tables
•Dynamic chunking algorithm to preserve the context information

EMBEDDING

•Use multiple embedding models (NV-Embed-v2, text-embedding-3-large, stella_en and Splade) to get dense and sparse vector representations of each chunk
•Add to each chunk hard-coded metadata like developer name or report summary

PRE-PROCESSING AND SEARCHING

•Question decomposition, hybrid search; use a multi-step approach to identify the relevant developers and report periods given the user's query
•Hypothetical Document Embeddings (HyDE) - generate a “fake” hypothetical document that captures relevant textual patterns from the initial query, to find relevant chunks
•Weighted Ranker to rerank chunks; fetch surrounding chunks to provide more context

GENERATING ANSWERS

•Reshuffle chunks to overcome the "Lost In the Middle" phenomenon
•Consolidate responses if chunks' length exceeds the model's max context size

The Verdict?

RAG has potential, but it’s no plug-and-play solution—especially where accuracy is non-negotiable.

For us at ROI that was more of a side R&D effort, we keep our focus on more proven 'AI Workflows'.

Of course, this space keeps changing on a daily basis...

Name:
Email:
Company:
Message:

Insights & Announcements